Re: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs
There has been some discussion on the Numba mailing list as well about a version of guvectorize that doesn't compile for testing and flexibility. Having this be inside NumPy itself seems ideal. -Travis On Tue, Sep 13, 2016 at 12:59 PM, Stephan Hoyer <sho...@gmail.com> wrote: > On Tue, Sep 13, 2016 at 10:39 AM, Nathan Goldbaum <nathan12...@gmail.com> > wrote: > >> I'm curious whether you have a plan to deal with the python functional >> call overhead. Numba gets around this by JIT-compiling python functions - >> is there something analogous you can do in NumPy or will this always be >> limited by the overhead of repeatedly calling a Python implementation of >> the "core" operation? >> > > I don't think there is any way to avoid this in NumPy proper, but that's > OK (it's similar to the existing overhead of vectorize). > > Numba already has guvectorize (and it's own version of vectorize as well), > which already does exactly this. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Custom Dtype/Units discussion
while I originally wrote NumPy (borrowing heavily from Numeric and drawing inspiration from Numarray and receiving a lot of help for specific modules from many of you), the community has continued to develop NumPy and now has a proper governance model. I am now simply an interested NumPy user and previous NumPy developer who finally has some concrete ideas to share based on work that I have been funding, leading, and encouraging for the past several years. I am still very interested in helping NumPy progress, but we are also going to be taking these ideas to create a general concept of the "buffer protocol in Python" to enable cross-language code-sharing to enable more code re-use for data analytics among language communities. This is the concept of "data-fabric" which is pre-alpha vapor-ware at this point but with some ideas expressed at http://datashape.pydata.org and here: https://github.com/blaze/datafabric and is something DyND is enabling. NumPy itself has a clear governance model and whether NumPy (the project) adopts any of the new array-computing concepts I am proposing will depend on this community's decisions as well as work done by motivated developers willing to work on prototypes.I will be wiling to help get funding for someone motivated to work on this. Best, -Travis > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache
Hi all, At Continuum we are trying to coordinate with Intel about releasing our patches from Accelerate upstream as well rather than having them redo things we have already done but have just not been able to open source yet. Accelerate also uses GPU accelerated FFTs and it would be nice if there were a supported NumPy-way of plugging in these optimized approaches. This is not a trivial thing to do, though and there are a lot of design choices. We have been giving away Accelerate to academics since it was released but have asked companies to pay for it as a means of generating money to support open source.Several things that used to be in Accelerate only are now already in open-source (e.g. cuda.jit, guvectorize, target='cuda' and target='parallel' in numba.vectorize). I expect this trend will continue. The FFT enhancements are another thing that are on the list of things to make open source. I for one, welcome Intel's contributions and am enthusiastic about their joining the Python development community. In many cases it would be better if they would just pay a company that already has built and tested this capability to release it then develop things themselves yet again. Any encouragement that can be provided to Intel to encourage them in this direction would help. Many companies are now supporting open-source. Even those that sell some software are still contributing overall to ensure that the total amount of useful open-source software available is increasing. Best, -Travis On Wed, Jun 1, 2016 at 7:42 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Jun 1, 2016 4:47 PM, "David Cournapeau" <courn...@gmail.com> wrote: > > > > > > > > On Tue, May 31, 2016 at 10:36 PM, Sturla Molden <sturla.mol...@gmail.com> > wrote: > >> > >> Joseph Martinot-Lagarde <contreba...@gmail.com> wrote: > >> > >> > The problem with FFTW is that its license is more restrictive (GPL), > and > >> > because of this may not be suitable everywhere numpy.fft is. > >> > >> A lot of us use NumPy linked with MKL or Accelerate, both of which have > >> some really nifty FFTs. And the license issue is hardly any worse than > >> linking with them for BLAS and LAPACK, which we do anyway. We could > extend > >> numpy.fft to use MKL or Accelerate when they are available. > > > > > > That's what we used to do in scipy, but it was a PITA to maintain. > Contrary to blas/lapack, fft does not have a standard API, hence exposing a > consistent API in python, including data layout involved quite a bit of > work. > > > > It is better to expose those through 3rd party APIs. > > Fwiw Intel's new python distribution thing has numpy patched to use mkl > for fft, and they're interested in pushing the relevant changes upstream. > > I have no idea how maintainable their patches are, since I haven't seen > them -- this is just from taking to people here at pycon. > > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Blog-post that explains what Blaze actually is and where Pluribus project now lives.
I have emailed this list in the past explaining what is driving my open source efforts now. Here is a blog-post that may help some of you understand at little bit of the history of Blaze, DyND Numba, and other related developments as they relate to scaling up and scaling out array-computing in Python. http://technicaldiscovery.blogspot.com/2016/03/anaconda-and-hadoop-story-of-journey.html This post and these projects do not have anything to do with the future of the NumPy and/or SciPy projects which are now in great hands guiding their community-driven development. The post is however, a discussion of additional projects that will hopefully benefit some of you as well, and for which your feedback and assistance is welcome. Best, -Travis -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking
On Wed, Mar 16, 2016 at 3:07 PM, Charles R Harris <charlesr.har...@gmail.com > wrote: > > > On Wed, Mar 16, 2016 at 1:48 PM, Travis Oliphant <tra...@continuum.io> > wrote: > >> >> >> On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith <n...@pobox.com> wrote: >> >>> Hi Travis, >>> >>> On Mar 16, 2016 9:52 AM, "Travis Oliphant" <tra...@continuum.io> wrote: >>> > >>> > Hi everyone, >>> > >>> > Can you help me understand why the stricter changes to generalized >>> ufunc argument checking no now longer allows scalars to be interpreted as >>> 1-d arrays in the core-dimensions? >>> > >>> > Is there a way to specify in the core-signature that scalars should be >>> allowed and interpreted in those cases as an array with all the elements >>> the same? This seems like an important feature. >>> >>> Can you share some example of when this is useful? >>> >> >> Being able to implicitly broadcast scalars to arrays is the core-function >> of broadcasting.This is still very useful when you have a core-kernel >> an want to pass in a scalar for many of the arguments. It seems that at >> least in that case, automatic broadcasting should be allowed --- as it >> seems clear what is meant. >> >> While you can use the broadcast* features to get the same effect with the >> current code-base, this is not intuitive to a user who is used to having >> scalars interpreted as arrays in other NumPy operations. >> > > The `@` operator doesn't allow that. > > >> >> It used to automatically happen and a few people depended on it in >> several companies and so the 1.10 release broke their code. >> >> I can appreciate that in the general case, allowing arbitrary >> broadcasting on the internal core dimensions can create confusion. But, >> scalar broadcasting still makes sense. >> > > Mixing array multiplications with scalar broadcasting is looking for > trouble. Array multiplication needs strict dimensions and having stacked > arrays and vectors was one of the prime objectives of gufuncs. Perhaps what > we need is a more precise notation for broadcasting, maybe `*` or some such > addition to the signaturs to indicate that scalar broadcasting is > acceptable. > I think that is a good idea.Let the user decide if scalar broadcasting is acceptable for their function. Here is a simple concrete example where scalar broadcasting makes sense: A 1-d dot product (the core of np.inner) (k), (k) -> () A user would assume they could call this function with a scalar in either argument and have it broadcast to a 1-d array.Of course, if both arguments are scalars, then it doesn't make sense. Having a way for the user to allow scalar broadcasting seems sensible and a nice compromise. -Travis > > > Chuck > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Changes to generalized ufunc core dimension checking
Hi everyone, Can you help me understand why the stricter changes to generalized ufunc argument checking no now longer allows scalars to be interpreted as 1-d arrays in the core-dimensions? Is there a way to specify in the core-signature that scalars should be allowed and interpreted in those cases as an array with all the elements the same? This seems like an important feature. Here's an example: myfunc with core-signature (t),(k),(k) -> (t) called with myfunc(arr1, arr2, scalar2). This used to work in 1.9 and before and scalar2 was interpreted as a 1-d array the same size as arr2. It no longer works with 1.10.0 but I don't see why that is an improvement. Thoughts? Is there a work-around that doesn't involve creating a 1-d array the same size as arr2 and filling it with scalar2? Thanks. -Travis -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking
On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith <n...@pobox.com> wrote: > Hi Travis, > > On Mar 16, 2016 9:52 AM, "Travis Oliphant" <tra...@continuum.io> wrote: > > > > Hi everyone, > > > > Can you help me understand why the stricter changes to generalized ufunc > argument checking no now longer allows scalars to be interpreted as 1-d > arrays in the core-dimensions? > > > > Is there a way to specify in the core-signature that scalars should be > allowed and interpreted in those cases as an array with all the elements > the same? This seems like an important feature. > > Can you share some example of when this is useful? > Being able to implicitly broadcast scalars to arrays is the core-function of broadcasting.This is still very useful when you have a core-kernel an want to pass in a scalar for many of the arguments. It seems that at least in that case, automatic broadcasting should be allowed --- as it seems clear what is meant. While you can use the broadcast* features to get the same effect with the current code-base, this is not intuitive to a user who is used to having scalars interpreted as arrays in other NumPy operations. It used to automatically happen and a few people depended on it in several companies and so the 1.10 release broke their code. I can appreciate that in the general case, allowing arbitrary broadcasting on the internal core dimensions can create confusion. But, scalar broadcasting still makes sense. A better workaround would be to use one of the np.broadcast* functions to > request exactly the broadcasting you want and make an arr2-sized view of > the scalar. In this case where you presumably (?) want to allow the last > two arguments to be broadcast against each other arbitrarily: > > arr2, arr3 = np.broadcast_arrays(arr2, scalar) > myufunc(arr1, arr2, arr3) > > A little wordier than implicit broadcasting, but not as bad as manually > creating an array, and like implicit broadcasting the memory overhead is > O(1) instead of O(size). > Thanks for the pointer (after I wrote the email this solution also occured to me). I think adding back automatic broadcasting for the scalar case makes a lot of sense as well, however. What do people think of that? Also adding this example to the documentation as a work-around for people whose code breaks with the new changes. Thanks, -Travis > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking
On Thu, Mar 17, 2016 at 4:41 PM, Stephan Hoyer <sho...@gmail.com> wrote: > On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant <tra...@continuum.io> > wrote: > >> I think that is a good idea.Let the user decide if scalar >> broadcasting is acceptable for their function. >> >> Here is a simple concrete example where scalar broadcasting makes sense: >> >> >> A 1-d dot product (the core of np.inner) (k), (k) -> () >> >> A user would assume they could call this function with a scalar in either >> argument and have it broadcast to a 1-d array.Of course, if both >> arguments are scalars, then it doesn't make sense. >> >> Having a way for the user to allow scalar broadcasting seems sensible and >> a nice compromise. >> >> -Travis >> > > To generalize a little bit, consider the entire family of weighted > statistical function (mean, std, median, etc.). For example, the gufunc > version of np.average is basically equivalent to np.inner with a bit of > preprocessing. > > Arguably, it *could* make sense to broadcast weights when given a scalar: > np.average(values, weights=1.0 / len(values)) is pretty unambiguous. > > That said, adding an explicit "scalar broadcasting OK flag" seems like a > hack that will need even more special logic (e.g., so we can error if both > arguments to np.inner are scalars). > > Multiple dispatch for gufunc core signatures seems like the cleaner > solution. If you want np.inner to handle scalars, you need to supply core > signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the > similar to vision of three core signatures for np.matmul: (i),(i,j)->(j), > (i,j),(j)->(i) and (i,j),(j,k)->(i,k). > > Maybe someone will even eventually get around to adding an axis/axes > argument so we can specify these core dimensions explicitly. Writing > np.inner(a, b, axes=((-1,), ())) could trigger the (k),()->() signature > even if the second argument is not a scalar (it should be broadcast against > "a" instead). > That's a great idea! Adding multiple-dispatch capability for this case could also solve a lot of issues that right now prevent generalized ufuncs from being the mechanism of implementation of *all* NumPy functions. -Travis > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.11.0 release notes.
Impressive work!Thank you for all the hard work that went in to these improvements and releases. -Travis On Wed, Jan 20, 2016 at 12:32 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > Hi All, > > I've put up a PR with revised 1.11.0 release notes at > https://github.com/numpy/numpy/pull/7073. I would appreciate it if anyone > involved in the 1.11 release would take a look and note anything missing > that they think should be included or things that are misrepresented. > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Should I use pip install numpy in linux?
On Thu, Jan 14, 2016 at 12:58 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal > <chris.bar...@noaa.gov> wrote: > >>> Also, you have the problem that there is one PyPi -- so where do you > put > >>> your nifty wheels that depend on other binary wheels? you may need to > fork > >>> every package you want to build :-( > >> > >> Is this a real problem or a theoretical one? Do you know of some > >> situation where this wheel to wheel dependency will occur that won't > >> just be solved in some other way? > > > > It's real -- at least during the whole bootstrapping period. Say I > > build a nifty hdf5 binary wheel -- I could probably just grab the name > > "libhdf5" on PyPI. So far so good. But the goal here would be to have > > netcdf and pytables and GDAL and who knows what else then link against > > that wheel. But those projects are all supported be different people, > > that all have their own distribution strategy. So where do I put > > binary wheels of each of those projects that depend on my libhdf5 > > wheel? _maybe_ I would put it out there, and it would all grow > > organically, but neither the culture nor the tooling support that > > approach now, so I'm not very confident you could gather adoption. > > I don't think there's a very large amount of cultural work - but some > to be sure. > > We already have the following on OSX: > > pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py > > where all the wheels come from pypi. So, I don't think this is really > outside our range, even if the problem is a little more difficult for > Linux. > > > Even beyond the adoption period, sometimes you need to do stuff in > > more than one way -- look at the proliferation of channels on > > Anaconda.org. > > > > This is more likely to work if there is a good infrastructure for > > third parties to build and distribute the binaries -- e.g. > > Anaconda.org. > > I thought that Anaconda.org allows pypi channels as well? > It does: http://pypi.anaconda.org/ -Travis > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Should I use pip install numpy in linux?
to do the same. The only thing I have heard are "chicken-and-egg" stories that come down to "we want people to be able to use pip." So, good, then let's make it so that pip can install conda packages and that conda packages with certain restrictions can be hosted on pypi or anywhere else that you have an "index". At least if there were valid reasons they could be addressed. But, this head-in-the-sand attitude towards a viable technology that is freely available is really puzzling to me. There are millions of downloads of Anaconda and many millions of downloads of conda packages each year. That is just with one company doing it. There could be many millions more with other companies and organizations hosting conda packages and indexes. The conda user-base is already very large. A great benefit to the Python ecosystem would be to allow pip users and conda users to share each other's work --- rather than to spend time reproducing work that is already done and freely available. -Travis > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?
__test >> >> So, the final disk usage is quite similar to NPZ, but it can store and >> retrieve lots faster. Also, the data decompression speed is on par to >> using non-compression. This is because bcolz uses Blosc behind the scenes, >> which is much faster than zlib (used by NPZ) --and sometimes faster than a >> memcpy(). However, even we are doing I/O against the disk, this dataset is >> so small that fits in the OS filesystem cache, so the benchmark is actually >> checking I/O at memory speeds, not disk speeds. >> >> In order to do a more real-life comparison, let's use a dataset that is >> much larger than the amount of memory in my laptop (8 GB): >> >> $ PYTHONPATH=. python key-store.py -f bcolz -m 100 -k 5000 -d >> /media/faltet/docker/__test -l 0 >> ## Checking method: bcolz (via ctable(clevel=0, cname='blosclz') >> >> Building database. Wait please... >> Time (creation) --> 133.650 >> Retrieving 100 keys in arbitrary order... >> Time ( query) --> 2.881 >> Number of elements out of getitem: 91907396 >> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh >> /media/faltet/docker/__test >> >> 39G /media/faltet/docker/__test >> >> and now, with compression on: >> >> $ PYTHONPATH=. python key-store.py -f bcolz -m 100 -k 5000 -d >> /media/faltet/docker/__test -l 9 >> ## Checking method: bcolz (via ctable(clevel=9, cname='blosclz') >> >> Building database. Wait please... >> Time (creation) --> 145.633 >> Retrieving 100 keys in arbitrary order... >> Time ( query) --> 1.339 >> Number of elements out of getitem: 91907396 >> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh >> /media/faltet/docker/__test >> >> 12G /media/faltet/docker/__test >> >> So, we are still seeing the 3x compression ratio. But the interesting >> thing here is that the compressed version works a 50% faster than the >> uncompressed one (13 ms/query vs 29 ms/query). In this case I was using a >> SSD (hence the low query times), so the compression advantage is even more >> noticeable than when using memory as above (as expected). >> >> But anyway, this is just a demonstration that you don't need heavy tools >> to achieve what you want. And as a corollary, (fast) compressors can save >> you not only storage, but processing time too. >> >> Francesc >> >> >> 2016-01-14 11:19 GMT+01:00 Nathaniel Smith <n...@pobox.com>: >> >>> I'd try storing the data in hdf5 (probably via h5py, which is a more >>> basic interface without all the bells-and-whistles that pytables >>> adds), though any method you use is going to be limited by the need to >>> do a seek before each read. Storing the data on SSD will probably help >>> a lot if you can afford it for your data size. >>> >>> On Thu, Jan 14, 2016 at 1:15 AM, Ryan R. Rosario <r...@bytemining.com> >>> wrote: >>> > Hi, >>> > >>> > I have a very large dictionary that must be shared across processes >>> and does not fit in RAM. I need access to this object to be fast. The key >>> is an integer ID and the value is a list containing two elements, both of >>> them numpy arrays (one has ints, the other has floats). The key is >>> sequential, starts at 0, and there are no gaps, so the “outer” layer of >>> this data structure could really just be a list with the key actually being >>> the index. The lengths of each pair of arrays may differ across keys. >>> > >>> > For a visual: >>> > >>> > { >>> > key=0: >>> > [ >>> > numpy.array([1,8,15,…, 16000]), >>> > numpy.array([0.1,0.1,0.1,…,0.1]) >>> > ], >>> > key=1: >>> > [ >>> > numpy.array([5,6]), >>> > numpy.array([0.5,0.5]) >>> > ], >>> > … >>> > } >>> > >>> > I’ve tried: >>> > - manager proxy objects, but the object was so big that >>> low-level code threw an exception due to format and monkey-patching wasn’t >>> successful. >>> > - Redis, which was far too slow due to setting up connections >>> and data conversion etc. >>> > - Numpy rec arrays + memory mapping, but there is a restriction >>> that the numpy arrays in each “column” must be o
Re: [Numpy-discussion] Should I use pip install numpy in linux?
Anaconda "build environment" was setup by Ilan and me.Aaron helped to build packages while he was at Continuum but spent most of his time on the open-source conda project. It is important to understand the difference between Anaconda and conda in this respect. Anaconda is a particular dependency foundation that Continuum supports and releases -- it will have a particular set of expected libraries on each platform (we work to keep this fairly limited and on Linux currently use CentOS 5 as the base). conda is a general package manager that is open-source and that anyone can use to produce a set of consistent binaries (there can be many conda-based distributions). It solves the problem of multiple binary dependency chains generally using the concept of "features". This concept of "features" allows you to create environments with different base dependencies. What packages you install when you "conda install" depends on which channels you are pointing to and which features you have "turned on" in the environment. It's a general system that extends the notions that were started by the PyPA. -Travis On Sun, Jan 10, 2016 at 10:14 PM, Robert McGibbon <rmcgi...@gmail.com> wrote: > > > Right. There's a small problem which is that the base linux system > >> isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries > > > that you're allowed to link to: ...", where that list is empirically > > > chosen to include only stuff that really is installed on ~all linux > >> machines and for which the ABI really has been stable in practice over > > > multiple years and distros (so e.g. no OpenSSL). > > > > > > Does anyone know who maintains Anaconda's linux build environment? > > > I strongly suspect it was originally set up by Aaron Meurer. Who > maintains it now that he is no longer at Continuum is a good question. > > From looking at all of the external libraries referenced by binaries > included in Anaconda > and the conda repos, I am not confident that they have a totally strict > policy here, or at least > not ones that is enforced by tooling. The sonames I listed here > <https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html> > cover > all of the external > dependencies used by the latest Anaconda release, but earlier releases and > other > conda-installable packages from the default channel are not so strict. > > -Robert > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] isfortran compatibility in numpy 1.10.
As I posted to the github issue, I support #2 as it is the original meaning. The most common case of isfortran that I recall was to support transpositions that needed to occur before calling Fortran-compiled linear algebra routines. However, with that said, you could also reasonably do #1 and likely have no real problem --- because transposing a 1-d array doesn't have any effect. In NumPy 1.0.1, isfortran was intended to be True only for arrays with a.ndim > 1. Thus, it would have been possible for someone to rely on that invariant for some other reason. With relaxed stride checking, this invariant changed because isfortran was implemented by returning True if the F_Contiguous flag was set but the C_Contiguous flag was not (this was only ever previously possible for a.ndim > 1). If you choose to go with #1, please emphasize in the release notes that isfortran now does not assume a.ndim > 1 but is simply short-hand for a.flags.f_contiguous. -Travis On Fri, Oct 30, 2015 at 5:12 PM, Charles R Harris <charlesr.har...@gmail.com > wrote: > Hi All, > > The isfortran function calls a.fnc (Fortran-Not-C), which is implemented > as F_CONTIGUOUS && !C_CONTIGUOUS. Before relaxed stride checking > contiguous multidimensional arrays could not be both and continguous 1-D > arrays were always CONTIGUOUS, but this is not longer the case. > Consequently current isfortran breaks backward compatiblity. There are two > suggested solutions > >1. Return `a.flags.f_contiguous`. This differs for 1-D arrays, but is >most consistent with the name isfortran. >2. Return `a.flags.f_contiguous and a.ndim > 1`, which would be >backward compatible. > > It is also possible to start with 2. but add a FutureWarning and later > move to 1, which it my preferred solution. See gh-6590 > <https://github.com/numpy/numpy/issues/6590> for the issue. > > Thoughts? > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy Generalized Ufuncs: Pointer Arithmetic and Segmentation Faults (Debugging?)
Two things that might help you create generalized ufuncs: 1) Look at Numba --- it makes it very easy to write generalized ufuncs in simple Python code. Numba will compile to machine code so it can be as fast as writing in C. Here is the documentation for that specific feature: http://numba.pydata.org/numba-doc/0.21.0/user/vectorize.html#the-guvectorize-decorator. One wart of the interface is that scalars need to be treated as 1-element 1-d arrays (but still use '()' in the signature). 2) Look at the linear algebra module in NumPy which now wraps a bunch of linear-algebra based generalized ufuncs (all written in C): https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src -Travis On Sun, Oct 25, 2015 at 7:06 AM, <eleanore.yo...@artorg.unibe.ch> wrote: > Dear Numpy maintainers and developers, > > Thanks for providing such a great numerical library! > > I’m currently trying to implement the Dynamic Time Warping metric as a set > of generalised numpy ufuncs, but unfortunately, I have lasting issues with > pointer arithmetic and segmentation faults. Is there any way that I can > use GDB or some such to debug a python/numpy extension? Furthermore: is it > necessary to use pointer arithmetic to access the function arguments (as > seen on http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html) > or is element access (operator[]) also permissible? > > To break it down quickly, I need to have a fast DTW distance function > dist_dtw() with two vector inputs (broadcasting should be possible), two > scalar parameters and one scalar output (signature: (i), (j), (), () -> ()) > usable in python for a 1-Nearest Neighbor classification algorithm. The > extension also implements two functions compute_envelope() and > piecewise_mean_reduction() which are used for lower-bounding based on Keogh > and Ratanamahatana, 2005. The source code is available at > http://pastebin.com/MunNaP7V and the prominent segmentation fault happens > somewhere in the chain dist_dtw() —> meta_dtw_dist() —> slow_dtw_dist(), > but I fail to pin it down. > > Aside from my primary questions, I wonder how to approach > errors/exceptions and unit testing when developing numpy ufuncs. Are there > any examples apart from the numpy manual that I could use as reference > implementations of generalised numpy ufuncs? > > I would greatly appreciate some insight into properly developing > generalised ufuncs. > > Best, > Eleanore > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Let's move forward with the current governance document.
Hi everyone, After some further thought and spending quite a bit of time re-reading the discussion on a few threads, I now believe that my request to be on the steering council might be creating more trouble than it's worth. Nothing matters to me more than seeing NumPy continue to grow and improve. So, I'm switching my position to supporting the adoption of the governance model outlined and just contributing as I can outside the steering council. The people on the steering council are committed to the success of NumPy and will do a great job --- they already have in contributing to the community over the past year(s).We can always revisit the question in a year if difficulties arise with the model. If my voice and other strong voices remain outside the council, perhaps we can all encourage that the intended community governance of NumPy does in fact happen, and most decisions continue to be made in the open. I had the pleasure last night of meeting one of the new NumPy core contributors, Allan Haldane. This only underscored my confidence in everyone who is contributing to NumPy today. This confidence has already been established by watching the great contributions of many talented developers who have given their time and talents to the project over the past several years. I hope that we can move on from the governance discussion and continue to promote the success of the project together. Best, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)
Thanks for the candid discussion and for expressing concerns freely. I think Nathaniel's "parenting" characterization of NumPy from me is pretty accurate.I do feel a responsibility for the *stuff* that's out there, and that is what drives me. I do see the many contributions from others and really learn from them as well. I have seen conversations on this list and others have characterizations of history that I don't agree with which affects decisions that are being made --- and so I feel compelled to try and share my view. I'm in a situation now where at least for 6 months or so I can help with NumPy more than I have been able to for 7 years. Focusing on the initial governance text, my issues are that 1) 1 year of inactivity to be removed from the council is too little for a long-running project like NumPy --- somewhere between 2 and 4 years would be more appropriate. I suppose 1 year of inactivity is fine if that is defined only as "failure to vote on matters before the council" 2) The seed council should not just be recent contributors but should include as many people as are willing to help who have a long history with the project. 3) I think people who contribute significantly generally should be able to re-join the steering council more easily than "going through the 1-year vetting process" again --- they would have to be approved by the current steering council but it should not be automatically disallowed (thus requiring the equivalent of an amendment to change it). I applaud the fact that the steering council will not be and should not be used except when absolutely necessary and for limited functions. Thanks, -Travis On Tue, Sep 29, 2015 at 4:06 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Fri, Sep 25, 2015 at 7:15 AM, Thomas Caswell <tcasw...@gmail.com> > wrote: > > To respond to the devils advocate: > > > > Creating this organizational framework is a one time boot-strapping > event. > > You could use wording like "The initial council will include those who > have > > made significant contributions to numpy in the past and want to be on > it" or > > "The initial council will be constructed by invitation by Nathaniel and > > Chuck". More objective criteria should be used going forward, but in > terms > > of getting things spun up quickly doing things by fiat is probably ok. > I am > > not even sure that the method by which the initial group is formed needs > to > > go into the governing document. > > The problem is that according to the current text, not only is Travis > ineligible to join the council (it's a little weird to put people on > the seed council who wouldn't be eligible to join it normally, but > okay, sure) -- he's not even eligible to stay on the council once he > joins. So we would need to change the text no matter what. > > Which we can do, if we decide that that's what we need to do to > accomplish what we want. It's our text, after all. I think it's > extremely important though that what we actually do, and what we write > down saying we will do, somehow match. Otherwise this whole exercise > has no point. > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)
> > > > [1] Sorry to "footnote" this, but I think I am probably rudely repeating > myself and frankly do **not want this to be discussed**. It is just to > try to be fully clear where I come from: > Until SciPy 2015, I could list many people on this list who have shown > more direct involvement in numpy then Travis since I joined and have no > affiliation to numpy. If Travis had been new to the community at the > time, I would be surprised if I would even recognize his name. > I know this is only half the picture and Travis already mentioned > another side, but this is what I mostly saw even if it may be a harsh > and rude assessment. > > I do understand this. That's actually why I'm speaking up, because I don't think my activity has been understood by many people who have joined this list only recently. I don't want to interfere with your activity or impede your progress, or to be asked permission for anything. In fact, I want to understand how to best use my limited time to support things. You in particular are interested in indexing and fixing it --- the current code is there for a reason and some of the issues being discussed today have been discussed before --- though we have the benefit of hindsight now. I have mostly been behind the scenes helping people since about 2010 --- but still thinking a lot about NumPy, the downstream community, integration with other libraries, and where things could go. I don't have the time to commit major code changes, but I do have the time to contribute perspective and even a design idea or two from time to time.Obviously, nobody has to listen. I understand and appreciate that there are a lot of people that have contributed code and discussion since 2009 and to them it probably seems I'm just popping in and out --- and if you only look at the contributor log you can wonder "who is this guy...". But, I did do *a lot* of work to get NumPy off the ground. Quite a bit of that work was very lonely with people interested in the merger but pretty skeptical until the work was nearly done (and then many people helped finish it and get it working and tested). I wish I had been a better architect at the time (I can see now many things that would have been done differently).But, I'm still proud of the work I did in creating a foundation many could build on --- at the time nobody else was stepping up to do the job. Since that time, I have remained very interested in the success of NumPy and supporting the many *users* of NumPy. What I most bring to the current community is having observed many, many uses of NumPy in the wild --- from people who would never post to this list and whose use-cases are absent from discussion or misunderstood. I also bring knowledge about the wider Python ecosystem and the broader world outside of NumPy alone. The group is free to take my ideas and/or contributions or leave them. And I am also free to just review pull requests and contribute if and when I might. Best, -Travis > > > > > Chuck > > > > > > > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
On Wed, Sep 23, 2015 at 3:02 AM, Fernando Perez <fperez@gmail.com> wrote: > Hi all, > > I would like to pitch in here, I am sorry that I didn't have the time > before... > > First, I want to disclose that recently Continuum made a research gift to > the Jupyter project; we were just now writing up a blog post to acknowledge > this, but in light of this discussion, I feel that I should say this up > front so folks can gauge any potential bias accordingly. > > > On Tue, Sep 22, 2015 at 3:44 AM, Travis Oliphant <tra...@continuum.io> > wrote: > >> I'm actually offended that so many at BIDS seem eager to crucify my >> intentions when I've done nothing but give away my time, my energy, my >> resources, and my sleep to NumPy for many, many years.I guess if your >> intent is to drive me away, then you are succeeding. > > > Travis, first, I'd like to kindly ask you not to conflate BIDS, an > institution where a large number of people work, with the personal opinions > of some, who happen to work there but in this case are speaking only for > themselves. You say "so many at BIDS", but as far as I know, your > disagreements are with Stefan and Nathaniel (Matthew doesn't work at > BIDS). You are painting with a very wide brush the work of many people, > and in the process, unfairly impacting others who have nothing to do with > this. > I accept that criticism and apologize for doing that. My *human* side was coming out, and I was not being fair. In my head, though I was also trying to illustrate how some seemed to be doing the same thing for Continuum or other companies. This did not come out very artfully in the early morning hours. I'm sorry.BIDS is doing a lot for the community --- the recent DS4DS workshop, for example, was a spectacularly useful summit --- I hope that many different write-ups and reports of the event make their way out into the world. > > > 1. I hope the discussion can move past the suspicion and innuendo about > Continuum and Travis. I haven't always agreed with how Travis communicates > some of his ideas, and I've said it to him in such instances (e.g. this > weekend, as I myself was surprised at how his last round of comments had > landed on the list a few days back). But I also have worked closely with > him for years because I know that he has proven, not in words, but in > actions, that he has the best interests of our community at heart, and that > he is willing to try and do everything in his power to help whenever he > can. > I really hope it's just a perception problem (perhaps on my end). There are challenges with working in the commercial world (there are a lot of things to do that have nothing to do with the technology creation) and communicating on open-source mailing lists.As many have noticed, despite my intentions to contribute, I really can't do the same level of contribution personally that I could when I was a student and a professor and had more time. However, I think that it is also under-appreciated (or mis-understood) how much time I have spent with training and helping people who have contributed instead.It's important to me to build a company that can sponsor people to work on open-source (in a community setting).We are still working on that, but it has been my intent. So, far it's actually easier to sponsor new projects than it is to sponsor people on old projects. I am quite sure that if Continuum had put 3 people full time on NumPy in 2012, there would have been a lot of back-lash and mis-understanding. That's why we didn't do it.The collateral effect of that was the creation of other tools that could be somewhat competitive with NumPy long term -- or not. I'd like to learn how to work with the community in an optimal way so that everyone benefits --- and progress happens. That's also why we created Numfocus --- though it is ironic that NumPy has been one of the last projects to actually sign up and be a formally sponsored project. 2. Conflicts of interest are a fact of life, in fact, I would argue that > every healthy and sufficiently interconnected community eventually *should* > have conflicts of interest. They are a sign that there is activity across > multiple centers of interest, and individuals with connections in multiple > areas of the community. And we *want* folks who are engaged enough > precisely to have such interests! > > For conflict of interest management, we don't need to reinvent the wheel, > this is actually something where our beloved institutions, blessed be their > bureaucratic souls, have tons of training materials that happen to be not > completely useless. Most universities and the national labs have > information on COIs that provides guidelines, and Numpy could include in >
Re: [Numpy-discussion] Steering Committee Size
On Wed, Sep 23, 2015 at 3:25 AM, Sebastian Bergwrote: > Hi, > > Trying to figure out at least a bit from the discussions. While I am > happy with the draft, I wonder if someone has some insights about some > questions: > > 1. How large crowds have examples of working well with apache style voting? > I don't have experience with this, other than Python mailing list where it is fine. > 2. How large do we expect numpy steering council to be (I have always > thought about 10). > I don't know. I don't think this is set as far as I'm aware. 3. More on opinions, how large does the community feel is too large (so > that we should maybe elect people). > > And to maybe more a discussion point, does the community feel that those > who would be/are affectivly now in the Steering Council do not sufficiently > represent old time contributers who were not active in the past year(s). > As I mentioned before, I am happy to serve on the initial seed council to help transition more fully to this style of governance. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
> > > One last time, it was *not* a personal reference to you: the only reason I > mentioned your names was because of the Berkeley clarification regarding > BIDS that I asked of Travis, that's all. If that comment hadn't been made, > I would not have made any mention whatsoever of anyone in particular. I > apologize for not foreseeing that this would have made you feel singled > out, in retrospect, I should have. > > In my mind, it was the opposite, as I felt that you had every right to > express whatever opinions you have speaking for yourselves, independent of > your affiliations, and I was simply asking Travis to separate individuals > from institutions. But I should have realized that calling anyone out by > name in a context like this is a bad idea regardless. > > This was my fault for not being more careful in my words. I felt multiple things when I wrote my emails that led to incorrectly chosen words --- but mostly I was feeling unappreciated, attacked, and accused. I'm sure now that was not intended --- but there have been mis-understandings. I expect they will happen again. I know if we listen to each other and trust that while we may see the world differently and have different framings of solutions --- we can work to coordinate on an important technical activity together. In retrospect, my initial email requesting inclusion on the seed council could have been worded better (as there were multiple things conflated together). I am responding to the actual text of the governance document in the other thread so as to clarify what my proposal actually is in the context of that document. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] interpretation of the draft governance document (was Re: Governance model request)
Hi Nathaniel, Thanks for the clarifications. Is the governance document committed to the repository? I keep looking for it and have a hard time finding it --- I think I read it last in an email. In this way, I could make Pull Requests to the governance document if there are concrete suggestions for change, and then have them reviewed in the standard way. I'm hopeful that a few tweaks to the document would satisfy all my concerns. Thanks, -Travis On Wed, Sep 23, 2015 at 1:04 PM, Nathaniel Smith <n...@pobox.com> wrote: > Hi Travis, > > On Tue, Sep 22, 2015 at 3:08 AM, Travis Oliphant <tra...@continuum.io> > wrote: > > > > > > On Tue, Sep 22, 2015 at 4:33 AM, Nathaniel Smith <n...@pobox.com> wrote: > >> > >> On Tue, Sep 22, 2015 at 1:24 AM, Travis Oliphant <tra...@continuum.io> > >> wrote: > >>> > >>> I actually do agree with your view of the steering council as being > >>> usually not really being needed.You are creating a straw-man by > >>> indicating otherwise.I don't believe a small council should do > anything > >>> *except* resolve disputes that cannot be resolved without one. Like > you, I > >>> would expect that would almost never happen --- but I would argue that > >>> extrapolating from Debian's experience is not actually relevant here. > >> > >> > >> To be clear, Debian was only one example -- what I'm extrapolating from > is > >> every community-driven F/OSS project that I'm aware of. > >> > >> It's entirely possible my data set is incomplete -- if you have some > other > >> examples that you think would be better to extrapolate from, then I'd be > >> genuinely glad to hear them. You may have noticed that I'm a bit of an > >> enthusiast on this topic :-). > >> > > > > > > Yes, you are much better at that than I am. I'm not even sure where I > > would look for this kind of data. > > > >>> > >>> > >>> > >>> So, if the steering council is not really needed then why have it at > all? > >>> Let's just eliminate the concept entirely. > >>> > >> > >> In my view, the reasons for having such a council are: > >> 1) The framework is useful even if you never use it, because it means > >> people can run "what if" scenarios in their mind and make decisions on > that > >> basis. In the US legal system, only a vanishingly small fraction of > cases go > >> to the Supreme Court -- but the rules governing the Supreme Court have a > >> huge effect on all cases, because people can reason about what would > happen > >> *if* they tried to appeal to the Supreme Court. > > > > > > O.K. That is a good point. I can see the value in that. > > > > > >> > >> 2) It provides a formal structure for interfacing with the outside > world. > >> E.g., one can't do anything with money or corporate contributions > without > >> having some kind of written-down and enforceable rules for making > decisions > >> (even if in practice you always stick to the "everyone is equal and we > >> govern by consensus" part of the rules). > > > > > > O.K. > > > >> > >> 3) There are rare but important cases where discussions have to be had > in > >> private. The main one is "personnel decisions" like inviting people to > join > >> the council; another example Fernando has mentioned to me is that when > they > >> need to coordinate a press release between the project and a funding > body, > >> the steering council reviews the press release before it goes public. > > > > > > O.K. > > > > > >> > >> That's pretty much it, IMO. > >> > >> The framework we all worked out at the dev meeting in Austin seems to > >> handle these cases well AFAICT. > > > > > > How did we "all" work it out when not everyone was there? This is > where I > > get lost. You talk about community decision making and yet any actual > > decision is always a subset of the community.I suppose you just rely > on > > the "if nobody complains than it's o.k." rule? That really only works > if > > the project is moving slowly. > > By "all" I just meant "all of us who were there" (which was a majority > of the active maintainers + a number of other interested parties -- > the list of attendees is in the meeting notes if you're cur
Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)
> > Regarding the seed council, I just tried to pick an objective > criterion and an arbitrary date that seemed generally in keeping with > idea of "should be active in the last 1-to-2-years-ish". Fiddling with > the exact date in particular makes very little difference -- between > pushing it back to 2 years ago today or forward to 1 year ago today, > the only thing that changes is whether Pauli makes the list or not. > (And Pauli is obviously a great council candidate, though I don't know > whether he even wants to be on it.) > > > Personally, I have no idea how big the council should be. Too big, and > > there is no point, consensus is harder to reach the larger the group, > > and the main (only?) role of the council is to resolve issues where > > consensus has not been reached in the larger community. But what is > > too big? > > > > As for make-up of the council, I think we need to expand beyond people > > who have recently contributed core code. > > > > Yes, the council does need to have expertise to make technical > > decisions, but if you think about the likely contentious issues like > > ABI breakage, a core-code focused view is incomplete. So there should > > be representation by: > > > > Someone(s) with a long history of working with the code -- that > > institutional memory of why decisions were made the way they were > > could be key. > > Sure -- though I can't really imagine any way of framing a rule like > this that *wouldn't* be satisfied by Chuck + Ralf + Pauli, so my guess > is that such a rule would not actually have any effect on the council > membership in practice. > As the original author of NumPy, I would like to be on the seed council as long as it is larger than 7 people.That is my proposal.I don't need to be a permanent member, but I do believe I have enough history that I can understand issues even if I haven't been working on code directly. I think I do bring history and information that provides all of the history that could be helpful on occasion. In addition, if a matter is important enough to even be brought to the attention of this council, I would like to be involved in the discussion about it. It's a simple change to the text --- basically an explanation that Travis requested to be on the seed council. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] interpretation of the draft governance document (was Re: Governance model request)
Council -- see below. > > Steering Council > > > The Project will have a Steering Council that consists of Project > Contributors who have produced contributions that are substantial in > quality and quantity, and sustained over at least one year. The overall > role of the Council is to ensure, with input from the Community, the > long-term well-being of the project, both technically and as a community. > > During the everyday project activities, council members participate in all > discussions, code review and other project activities as peers with all > other Contributors and the Community. In these everyday activities, Council > Members do not have any special power or privilege through their membership > on the Council. However, it is expected that because of the quality and > quantity of their contributions and their expert knowledge of the Project > Software and Services that Council Members will provide useful guidance, > both technical and in terms of project direction, to potentially less > experienced contributors. > > The Steering Council and its Members play a special role in certain > situations. In particular, the Council may, if necessary: > > - Make decisions about the overall scope, vision and direction of the > project. > - Make decisions about strategic collaborations with other organizations > or individuals. > - Make decisions about specific technical issues, features, bugs and > pull requests. They are the primary mechanism of guiding the code review > process and merging pull requests. > - Make decisions about the Services that are run by The Project and > manage those Services for the benefit of the Project and Community. > - Update policy documents such as this one. > - Make decisions when regular community discussion doesn’t produce > consensus on an issue in a reasonable time frame. > > However, the Council's primary responsibility is to facilitate the > ordinary community-based decision making procedure described above. If we > ever have to step in and formally override the community for the health of > the Project, then we will do so, but we will consider reaching this point > to indicate a failure in our leadership. > > ### Council decision making > > If it becomes necessary for the Steering Council to produce a formal > decision, then they will use a form of the [Apache Foundation voting > process](https://www.apache.org/foundation/voting.html). This is a > formalized version of consensus, in which +1 votes indicate agreement, -1 > votes are vetoes (and must be accompanied with a rationale, as above), and > one can also vote fractionally (e.g. -0.5, +0.5) if one wishes to express > an opinion without registering a full veto. These numeric votes are also > often used informally as a way of getting a general sense of people's > feelings on some issue, and should not normally be taken as formal votes. A > formal vote only occurs if explicitly declared, and if this does occur then > the vote should be held open for long enough to give all interested Council > Members a chance to respond -- at least one week. > > In practice, we anticipate that for most Steering Council decisions (e.g., > voting in new members) a more informal process will suffice. > > ### Council membership > > To become eligible to join the Steering Council, an individual must be a > Project Contributor who has produced contributions that are substantial in > quality and quantity, and sustained over at least one year. Potential > Council Members are nominated by existing Council members and voted upon by > the existing Council after asking if the potential Member is interested and > willing to serve in that capacity. The Council will be initially formed > from the set of existing Core Developers who, as of late 2015, have been > significantly active over the last year. > > Concretely, I'm asking to be included in this initial council so a simple "along with Travis Oliphant who is the original author of NumPy". If other long-time contributors to the code-base also want to be on this initial seed council, I think it would make sense as well. > When considering potential Members, the Council will look at candidates > with a comprehensive view of their contributions. This will include but is > not limited to code, code review, infrastructure work, mailing list and > chat participation, community help/building, education and outreach, design > work, etc. We are deliberately not setting arbitrary quantitative metrics > (like “100 commits in this repo”) to avoid encouraging behavior that plays > to the metrics rather than the project’s overall well-being. We want to > encourage a diverse array of backgrounds, viewpoints and talents in our > team,
Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)
On Wed, Sep 23, 2015 at 6:19 PM, Charles R Harris <charlesr.har...@gmail.com > wrote: > > > On Wed, Sep 23, 2015 at 3:42 PM, Chris Barker <chris.bar...@noaa.gov> > wrote: > >> On Wed, Sep 23, 2015 at 2:21 PM, Travis Oliphant <tra...@continuum.io> >> wrote: >> >> >>> As the original author of NumPy, I would like to be on the seed council >>> as long as it is larger than 7 people.That is my proposal. >>> >> >> Or the seed council could invite Travis to join as its first order of >> business :-) >> >> Actually, maybe that's a way to handle it -- declare that the first order >> of business for teh seed council is to expand the council. >> > > Perhaps we should specify a yearly meeting to review the past year and > nominate people for commit rights and council membership. Long term, we > might also want to start removing commit rights, perhaps by adding a team > category on github with restricted rights -- committer emeritus, so to > speak. > That's a pretty good idea, actually. > > >> I'd still like some guidelines (suggestions) for history and at least one >> major dependent-on-numpy rep. Travis would certainly meet the history >> requirement -- and maybe the other, too. :-) >> >> >>> It's a simple change to the text --- basically an explanation that >>> Travis requested to be on the seed council. >>> >> >> I'd rather the final draft of the document didn't name names, but no >> biggie. >> > I'm fine with that too --- except you will need to name the initial seed council. -Travis > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
Thank you for posting that draft as it is a useful comparison to borrow from. I think Nathaniel's original document is a great start. Perhaps some tweaks along the lines of what you and Matt have suggested could also be useful. I agree that my proposal is mostly about altering the governance model, mixed with some concern about being "automatically disqualified" from a council that can decide the future of NumPy if things don't move forward. -Travis On Tue, Sep 22, 2015 at 12:57 AM, Stefan van der Walt <stef...@berkeley.edu> wrote: > On 2015-09-20 11:20:28, Travis Oliphant <tra...@continuum.io> wrote: > > I would recommend three possible adjustments to the steering council > > concept. > > > > 1 - define a BDFL for the council. I would nominate chuck Harris > > > > 2 - limit the council to 3 people. I would nominate chuck, nathaniel, > and > > pauli. > > > > 3 - add me as a permanent member of the steering council. > > I would split the above into two parts: a suggestion on how to change > the governance model (first half of 1 and 2) and then some thoughts on > what to do once those changes have been made (latter half of 1 and 2, as > well as 3). > > For now, since those changes are not in place yet, it's probably best > to focus on the governance model. > > I would agree that one person (or a very small group) is best suited to > "getting things unstuck". And, personally, I believe it best for that > person/persons to be elected by the community (whatever we define "the > community" to be)---which is what I presume you suggested when you > mentioned nominating candidates. > > Since Matthew mentioned the governance proposal we're working on, here > is a very early draft: > > > https://github.com/stefanv/skimage-org/blob/governance_proposal/governance.md > > As I said, this is still a work-in-progress--comments are welcome. > E.g., the weighting element in the voting has to be fine tuned (but was > put in place to prevent rapid take-overs). > > Essentially, we need: > > - a way for community members to express disagreement without being > ousted, > - protection against individuals who want to exert disproportional > influence, > - protection against those in leadership roles who cause the project > long-term harm, > - and a way for the community to change the direction of the project if > they so wished. > > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.10.0rc1 coming tomorrow, 22 Sept.
Of course it will be 1.10.0 final where all the problems will show up suddenly :-) Perhaps we can get to where we are testing Anaconda against beta releases better. -Travis On Mon, Sep 21, 2015 at 5:19 PM, Charles R Harris <charlesr.har...@gmail.com > wrote: > Hi All, > > Just a heads up. The lack of reported problems in 1.10.0b1 has been > stunning. > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Commit rights for Allan Haldane
Excellent news! Welcome Allan. -Travis On Tue, Sep 22, 2015 at 1:54 PM, Charles R Harris <charlesr.har...@gmail.com > wrote: > Hi All, > > Allan Haldane has been given commit rights. Here's to the new member of > the team. > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
On Tue, Sep 22, 2015 at 1:20 PM, Stefan van der Waltwrote: > > > I guess we've gone off the rails pretty far at this point, so let me at > least take a step back, and make sure that you know that: > > - I have never doubted that your intensions for NumPy are anything but > good (I know they are!), > - I *want* the community to be a welcoming place for companies to > contribute (otherwise, I guess I'd not be such a fervent supporter of > the scientific eco-system using the BSD license), and > - I love your enthusiasm for the project. After all, that is a big part > of what inspired me to become involved in the first place. > > My goal is not to spread uncertainty, fear nor doubt—if that was the > perception left, I apologize. > > I'll re-iterate that I wanted to highlight a concern about the > interactions of a (somewhat weakly cohesive) community and strong, > driven personalities such as yourself backed by a formidable amount of > development power. No matter how good your intensions are, there are > risks involved in this kind of interaction, and if we fail to even > *admit* that, we are in trouble. > > Lest the above be read in a negative light again, let me state it > up-front: *I don't think you will hijack the project, use it for your > own gain, or attempt to do anything you don't believe to be in the best > interest of NumPy.* What I'm saying is that we absolutely need to move > forward in a way that brings everyone along, and makes everyone rest > assured that their voice will be heard. > > Thank you for the clarification. I'm sorry that I started to question your intentions.I agree that everyone should rest assured that their voice will be heard. I have been and continue to be a staunch advocate for the voices that are not even on this mailing list. > Also, please know that I have not discussed these matters with Nathaniel > behind the scenes, other than an informal hour-long discussion about his > original governance proposal. There is no BIDS conspiracy or attempts > at crucifixion. After all, you were an invited guest speaker at an > event I organized this weekend, since I value your opinion and insights. > > Thank you. I'm sorry for implying otherwise. That was wrong of me. I know we are just trying to bring all the voices to the table. Best, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
I am not upset nor was I ever upset about discussing the possibility of conflict of interest. Of course it can be discussed --- but it should be discussed directly about specific things --- and as others have said it is generally easily handled when it actually could arise. The key is to understand affiliations. We should not do things in the community that actually encourage people to hide their affiliations for fear of backlash or bias. I was annoyed at the insinuation that conflict of interest is a company-only problem that academics are somehow immune to. I was upset about accusations and mis-interpretations of my activities and those of my colleagues in behalf of the community. On Tue, Sep 22, 2015 at 1:48 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Tue, Sep 22, 2015 at 11:20 AM, Stefan van der Walt > <stef...@berkeley.edu> wrote: > > Hi Travis > > > > On 2015-09-22 03:44:12, Travis Oliphant <tra...@continuum.io> wrote: > >> I'm actually offended that so many at BIDS seem eager to crucify my > >> intentions when I've done nothing but give away my time, my energy, my > >> resources, and my sleep to NumPy for many, many years.I guess if > your > >> intent is to drive me away, then you are succeeding. > > > > I guess we've gone off the rails pretty far at this point, so let me at > > least take a step back, and make sure that you know that: > > > > - I have never doubted that your intensions for NumPy are anything but > > good (I know they are!), > > - I *want* the community to be a welcoming place for companies to > > contribute (otherwise, I guess I'd not be such a fervent supporter of > > the scientific eco-system using the BSD license), and > > - I love your enthusiasm for the project. After all, that is a big part > > of what inspired me to become involved in the first place. > > > > My goal is not to spread uncertainty, fear nor doubt—if that was the > > perception left, I apologize. > > > > I'll re-iterate that I wanted to highlight a concern about the > > interactions of a (somewhat weakly cohesive) community and strong, > > driven personalities such as yourself backed by a formidable amount of > > development power. No matter how good your intensions are, there are > > risks involved in this kind of interaction, and if we fail to even > > *admit* that, we are in trouble. > > > > Lest the above be read in a negative light again, let me state it > > up-front: *I don't think you will hijack the project, use it for your > > own gain, or attempt to do anything you don't believe to be in the best > > interest of NumPy.* What I'm saying is that we absolutely need to move > > forward in a way that brings everyone along, and makes everyone rest > > assured that their voice will be heard. > > > > Also, please know that I have not discussed these matters with Nathaniel > > behind the scenes, other than an informal hour-long discussion about his > > original governance proposal. There is no BIDS conspiracy or attempts > > at crucifixion. After all, you were an invited guest speaker at an > > event I organized this weekend, since I value your opinion and insights. > > > > Either way, let me again apologize if my suggested lack of insight hurt > > people's feelings. I can only hope that, in educating me, we all learn > > a few lessons. > > I'm also in favor of taking a step back. > > The point is, that a sensible organization and a sensible leader has > to take the possibility of conflict of interest into account. They > also have to consider the perception of a conflict of interest. > > It is the opposite of sensible, to respond to this with 'how dare you" > or by asserting that this could never happen or by saying that we > shouldn't talk about that in case people get frightened. I point you > again to Linus' interview [1]. He is not upset that he has been > insulted by the implication of conflict of interest, he soberly > accepts that this will always be an issue, with companies in > particular, and goes out of his way to address that in an explicit and > reasonable way. > > Cheers, > > Matthew > > [1] http://www.bbc.com/news/technology-18419231 > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
On Tue, Sep 22, 2015 at 2:16 AM, Stefan van der Walt <stef...@berkeley.edu> wrote: > Hi Travis > > On 2015-09-21 23:29:12, Travis Oliphant <tra...@continuum.io> wrote: > > 1) nobody believes that the community should be forced to adopt numba > as > > part of ufunc core yet --- but this could happen someday just as Cython > is > > now being adopted but was proposed 8 years ago that it "could be adopted" > > That's a red-hearing. > > Yes, I'd like to clarify: I was not against including any specific > technology in NumPy. I was highlighting that there may be different > motivations for members of the general community and those working for, > say, Continuum, to get certain features adopted. > This is what I'm calling you out on. Why? I think that is an unfair statement and inaccurate. The general community includes Continuum, Enthought, Microsoft, Intel, various hedge funds, investment banks and companies large and small. Are you saying that people should not be upfront about their affiliations with a company? That if they are not academics, then they should not participate in the discussion? It is hard enough to be at a company and get time to contribute effort back to an open source project.We should not be questioning people's motives just *because* they are at a company. We should not assume people cannot think in terns of the success of the project, just because they are at a company. Their proposals and contributions can be evaluated on their merits and value --- so this whole discussion seems to be just revealing an anti-company paranoia rather than helping understand the actual concern. > > 2) I have stated that breaking the ABI is of little consequence because > > of conda as well as other tools.I still believe that. This has > nothing > > to do with any benefit Continuum might or might not receive because of > > conda. Everyone else who wants to make a conda-based distribution also > > benefits (Cloudera, Microsoft, Intel, ...) or use conda also benefits. > > I don't think the community realizes the damange that is done with FUD > like > > this. There are real implications. It halts progress, creates > confusion, > > and I think ultimately damages the community. > > This is an old argument, and the reason why we have extensive measures > in place to guard against ABI breakage. But, reading what you wrote > above, I would like to understand better what FUD you are referring to, > because I, rightly or wrongly, believe there is a real concern here that > is being glossed over. > I don't know which is the "old argument". Anyway, old arguments can still be right. The fact is that not breaking the ABI has caused real damage to the community. NumPy was never designed to not have it's ABI broken for over a decade. We have some attempts to guard against ABI breakage --- but they are not perfect. We have not moved the code-base forward for fear of breaking the ABI. When it was hard to update your Python installation that was a concern. There are very few cases where this is still the concern (conda is a big part of it but not the only part as other distros and approaches for easily updating the install exist) --- having this drive major architecture decisions is a serious mistake in my mind, and causes a lot more work than it should. The FUD I'm talking about is the anti-company FUD that has influenced discussions in the past.I really hope that we can move past this. > > > I don't see how.None of these have been proposed for integrating into > > NumPy.I don't see how integrating numba into NumPy benefits Continuum > > at all. It's much easier for us to keep it separate. At this point > > Continuum doesn't have an opinion about integrating DyND into NumPy or > > not. > > I think that touches, tangentially at least, on the problem. If an > employee of Continuum were steering NumPy, and the company developed an > opinion on those integrations, would such a person not feel compelled to > toe the company line? (Whether the company is Continuum or another is > besides the point—I am only trying to understand the dynamics of working > for a company and leading an open source project that closely interacts > with their producs.) > O.K. if you are honestly asking this question out of inexperience, then I can at least help you understand because perhaps that is the problem (creating a straw-man that doesn't exist).I have never seen a motivated open source developer at a company who "tows the company line" within a community project that is accepted long term.All that would do is drive the developer out of the company and be a sure-fire way to make sure their contributions are not accepted. I know that a
Re: [Numpy-discussion] Governance model request
bit" to differentiate people in the community. -Travis On Tue, Sep 22, 2015 at 2:11 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Mon, Sep 21, 2015 at 9:20 AM, Travis Oliphant <tra...@continuum.io> > wrote: > > > > I wrote my recommendations quickly before heading on a plane.I hope > the spirit of them was caught correctly.I also want to re-emphasize > that I completely understand that the Steering Council is not to be making > decisions that often and almost all activity will be similar to it is now > --- discussion, debate, proposals, and pull-requests --- that is a good > thing. > > > > However, there is a need for leadership to help unstick things and move > the project forward from time to time because quite often doing *something* > can be better than trying to please everyone with a voice. My concerns > about how to do this judgment have 2 major components: > > > > 1) The need for long-term consistency --- a one-year horizon on defining > this group is too short in my mind for a decades-old project like NumPy. > > 2) The group that helps unstick things needs to be small (1, 3, or 5 at > the most) > > For reference, the rules for steering council membership were taken > directly from those used by the Jupyter project, and their steering > council currently has 10 people, making it larger than the "seed > council" proposed in the numpy document: > https://github.com/jupyter/governance/blob/master/people.md > > > We could call this group the "adjudication group" rather than the > "Steering Council" as well. I could see that having a formal method of > changing that "adjudication group" would be a good idea as well (and > perhaps that formal vote could be made by a vote of a group of active > contributors. In that case, I would define active as having a time-window > of 5 years instead of just 1). > > I may be misreading things, but I'm getting the impression that the > active "adjudication group" you envision is radically different from > the "steering council" as envisioned by the current governance > document. It also, I think, radically different from anything I've > ever seen in a functioning community-run FOSS project and frankly it's > something where if I saw a project using this model, it would make me > extremely wary about contributing. > > The key point that I think differs is that you envision that this > "adjudication group" will actually intervene into discussions and make > formal decisions in situations other than true irreconcilable crises, > which in my estimation happen approximately never. The only two kinds > of F/OSS projects that I can think of that run like this are (a) > projects that are not really community driven at all, but rather run > as internal company projects that happen to have a public repository, > (b) massive projects like Debian and Fedora that have to manage > literally thousands of contributors, and thus have especially robust > backstop procedures to handle the rare truly irreconcilable situation. > > E.g., the Debian CTTE acts as an "adjudication group" in the way it > sounds like you envision it: on a regular basis, irreconcilable > arguments in Debian get taken to them to decide, and they issue a > ruling. By some back of the envelope calculations, it looks like they > issue approximately ~0.002 rulings per debian-contributor-year [1][2]. > If we assume crudely that irreconcilable differences scale linearly > with the size of a project, this suggests that a ~20 person project > like NumPy should require a ruling ~once every 20 years. > > Or quoting myself from the last thread about this [3]: > ] Or on the other end of things, you have e.g. Subversion, which had an > ] elaborate defined governance system with different levels of > ] "core-ness", a voting system, etc. -- and they were 6 years into the > ] project before they had their first vote. (The vote was on the crucial > ] technical decision of whether to write function calls like "f ()" or > ] "f()".) > > These are two real projects and how they really work. And even in > projects that do have a BDFL, the successful ones almost never use > this power to actually "unstick things" (i.e., use their formal power > to resolve a discussion). Consider PEP 484, Guido's somewhat > controversial type hints proposal: rather than use his power to move > the debate along, he explicitly delegated his power to one of the > idea's strongest critics [4]. > > Of course, things to get stuck. But the only time that getting them > unstuck needs or even benefits from the existence of a formal > "unstick
Re: [Numpy-discussion] Governance model request
> > > > > May? Can you elaborate? More speculation. My own position is that > > these projects want to integrate with NumPy, not the > > converse. Regardless of my opinion, can you actually make any specific > > arguements, one way or the otehr? What if if some integrations > > actually make more sense for the community? Is this simply a dogmatic > > ideological position that anything whatsoever that benefits both NumPy > > and Continuum simultaneously is bad, on principle? That's fine, as > > such, but let's make that position explicit if that's all it is. > > No, I don't have such a dogmatic ideological position. I think, > however, that it is somewhat unimaginative to propose that there are no > potential conflicts whatsoever. > > I am happy if we can find solutions that benefit both numpy and any > company out there. But in the end, I'm sure you'd agree that we want > the decisions that lead to such solutions to be taken in the best > interest of the project, and not be weighed by alterior motivations of > any sorts. In the end, even the *perception* that that is not the case > can be very harmful. > I will only comment on the last point. I completely agree that the *perception* that this is not the case can be harmful. But, what concerns me is where this perception comes from --- from actual evidence of anything that is not in the best interests of the project --- or just ideological differences of opinion about the way the world works and the perceptions around open source and markets. It is quite easy for someone to spread FUD about companies that contribute to open source --- and it has the effect of discouraging companies from continuing to contribute to community projects.This removes a huge amount of potential support from projects. In NumPy's case in particular, this kind of attitude basically guarantees that I won't be able to contribute effectively and potentially even people I fund to contribute might not be accepted --- not because we can't faithfully participate in the same spirit that we have always contributed to SciPy and NumPy and other open source projects --- but because people are basically going to question things just because. What exactly do you need me to say to get you to believe that I have nothing but the best interests of array computing in Python at heart? The only thing that is different between me today and me 18 years ago is that 1) I have more resources now, 2) I have more knowledge about computer science and software architecture and 3) I have more experience with how NumPy gets used.All I can do is continue to try and make things better the best way I know how. -Travis -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
On Tue, Sep 22, 2015 at 1:07 AM, Stefan van der Walt <stef...@berkeley.edu> wrote: > On 2015-09-21 22:15:55, Bryan Van de Ven <bry...@continuum.io> wrote: > > Beyond that, what (even in a broad sense) is an example of a goal that > > "Continuum might need" that would conceivably do detriment to the > > NumPy community? That it be faster? Simpler to maintain? Easier to > > extend? Integrate better with more OS projects? Attract new active > > developers? Receive more financial support? Grow its user base even > > more? > > I don't know how productive it is to dream up examples, but it's not > very hard to do. Currently, e.g., the community is not ready to adopt > numba as part of the ufunc core. But it's been stated by some that, > with so many people running Conda, breaking the ABI is of little > consequence. And then it wouldn't be much of a leap to think that numba > is an acceptable dependency. > A couple of things to help clarify: 1) nobody believes that the community should be forced to adopt numba as part of ufunc core yet --- but this could happen someday just as Cython is now being adopted but was proposed 8 years ago that it "could be adopted" That's a red-hearing. 2) I have stated that breaking the ABI is of little consequence because of conda as well as other tools.I still believe that. This has nothing to do with any benefit Continuum might or might not receive because of conda. Everyone else who wants to make a conda-based distribution also benefits (Cloudera, Microsoft, Intel, ...) or use conda also benefits. I don't think the community realizes the damange that is done with FUD like this. There are real implications. It halts progress, creates confusion, and I think ultimately damages the community. Numba being an acceptable dependency means a lot more than conda --- it's dependent on LLVM compiled support which would have to be carefully tested --- first as only an optional dependency for many years. > > There's a broad range of Continuum projects that intersect with what > NumPy does: numba, DyND, dask and Odo to name a few. Integrating them > into NumPy may make a lot more sense for someone from Continuum than for > other members of the community. > I don't see how.None of these have been proposed for integrating into NumPy.I don't see how integrating numba into NumPy benefits Continuum at all. It's much easier for us to keep it separate. At this point Continuum doesn't have an opinion about integrating DyND into NumPy or not. These projects will all succeed or fail on their own based on users needs. Whether or not they every become a part of NumPy will depend on whether they are useful as such not because a person at Continuum is part of a steering committee (with other people on it). I know that you were responding to specific question by Brian as to how their could be a conflict of interest for Continuum and NumPy development. I don't think this is a useful conversation --- we could dream up all kinds of conflicts of interest for BIDS and NumPy too (e.g. perhaps BIDS really wants Spark to take over and for NumPy to have special connections to Spark). Are we to not allow anyone at BIDS to participate in the steering council because of their other interests? But remember, the original point is whether or not someone from Continuum (or I presume any company and not just singling out Continuum for special treatment) should be on the steering council.Are you really arguing that they shouldn't because there are other projects Continuum is working on that have some overlap with NumPy.I really hope you don't actually believe that. -Travis > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.10.0rc1 coming tomorrow, 22 Sept.
Absolutely it would be good if others can test. All I was suggesting is that we do run a pretty decent set of tests upon build and that would be helpful. If the numpy build recipes are not available, it is only because they have not been updated to use conda-build yet. If somebody wants to volunteer to convert all of our internal recipes to conda-build recipes so they could be open source --- we would welcome the help. But, it's not just the numpy recipes, it's the downstream binaries and their test-suite as well that is useful to run. I am hoping we will have something automatic here in the next few months on anaconda.org that will make this easier -- but no promises at this point. -Travis On Tue, Sep 22, 2015 at 2:19 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Sep 21, 2015 11:51 PM, "Travis Oliphant" <tra...@continuum.io> wrote: > > > > Of course it will be 1.10.0 final where all the problems will show up > suddenly :-) > > > > Perhaps we can get to where we are testing Anaconda against beta > releases better. > > The most useful thing would actually not even involve you doing any more > testing, but just if you could make builds available so that end-users > could easily conda install the prereleases and do their own testing against > their own choice. In principle I guess we could provide our own binstar > channel for this, but it's difficult given that AFAIK rebuilding numpy in > conda requires also rebuilding the whole stack, and the numpy build recipes > are still proprietary. > > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
s part of the ufunc core. But it's been stated by some that, > > > > Who are you speaking for? The entire community? Under what mandate? > > > >> with so many people running Conda, breaking the ABI is of little > >> consequence. And then it wouldn't be much of a leap to think that numba > >> is an acceptable dependency. > > > > The current somewhat concrete proposal I am aware of involves funding > cleaning up dtypes. Is there another concrete, credible proposal to make > Numba a dependency of NumPy that you can refer to? If not, why are we mired > in hypotheticals? > > > >> There's a broad range of Continuum projects that intersect with what > >> NumPy does: numba, DyND, dask and Odo to name a few. Integrating them > >> into NumPy may make a lot more sense for someone from Continuum than for > >> other members of the community. > > > > May? Can you elaborate? More speculation. My own position is that these > projects want to integrate with NumPy, not the converse. Regardless of my > opinion, can you actually make any specific arguements, one way or the > otehr? What if if some integrations actually make more sense for the > community? Is this simply a dogmatic ideological position that anything > whatsoever that benefits both NumPy and Continuum simultaneously is bad, on > principle? That's fine, as such, but let's make that position explicit if > that's all it is. > > > > Bryan > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Nathaniel J. Smith -- http://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
On Tue, Sep 22, 2015 at 4:33 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Tue, Sep 22, 2015 at 1:24 AM, Travis Oliphant <tra...@continuum.io> > wrote: > >> I actually do agree with your view of the steering council as being >> usually not really being needed.You are creating a straw-man by >> indicating otherwise.I don't believe a small council should do anything >> *except* resolve disputes that cannot be resolved without one. Like you, I >> would expect that would almost never happen --- but I would argue that >> extrapolating from Debian's experience is not actually relevant here. >> > > To be clear, Debian was only one example -- what I'm extrapolating from is > every community-driven F/OSS project that I'm aware of. > > It's entirely possible my data set is incomplete -- if you have some other > examples that you think would be better to extrapolate from, then I'd be > genuinely glad to hear them. You may have noticed that I'm a bit of an > enthusiast on this topic :-). > > Yes, you are much better at that than I am. I'm not even sure where I would look for this kind of data. > >> > So, if the steering council is not really needed then why have it at all? >> Let's just eliminate the concept entirely. >> >> > In my view, the reasons for having such a council are: > 1) The framework is useful even if you never use it, because it means > people can run "what if" scenarios in their mind and make decisions on that > basis. In the US legal system, only a vanishingly small fraction of cases > go to the Supreme Court -- but the rules governing the Supreme Court have a > huge effect on all cases, because people can reason about what would happen > *if* they tried to appeal to the Supreme Court. > O.K. That is a good point. I can see the value in that. > 2) It provides a formal structure for interfacing with the outside world. > E.g., one can't do anything with money or corporate contributions without > having some kind of written-down and enforceable rules for making decisions > (even if in practice you always stick to the "everyone is equal and we > govern by consensus" part of the rules). > O.K. > 3) There are rare but important cases where discussions have to be had in > private. The main one is "personnel decisions" like inviting people to join > the council; another example Fernando has mentioned to me is that when they > need to coordinate a press release between the project and a funding body, > the steering council reviews the press release before it goes public. > O.K. > That's pretty much it, IMO. > > The framework we all worked out at the dev meeting in Austin seems to > handle these cases well AFAICT. > How did we "all" work it out when not everyone was there? This is where I get lost. You talk about community decision making and yet any actual decision is always a subset of the community.I suppose you just rely on the "if nobody complains than it's o.k." rule? That really only works if the project is moving slowly. > But there are real questions that have to have an answer or an approach to >> making a decision. The answer to these questions cannot really be a vague >> notion of "lack of vigorous opposition by people who read the mailing list" >> which then gets parried about as "the community decided this." The NumPy >> user base is far, far larger than the number of people that read this list. >> > > According to the dev meeting rules, no particularly "vigorous opposition" > is required -- anyone who notices that something bad is happening can write > a single email and stop an idea dead in its tracks, with only the steering > council able to overrule. We expect this will rarely if ever happen, > because the threat will be enough to keep everyone honest and listening, > but about the only way we could possibly be *more* democratic is if we > started phoning up random users at home to ask their opinion. > O.K. so how long is the time allowed for this kind of opposition to be noted? > > This is actually explicitly designed to prevent the situation where > whoever talks the loudest and longest wins, and to put those with more and > less time available on an equal footing. > > >> For better or for worse, we will always be subject to the "tyranny of who >> has time to contribute lately".Fundamentally, I would argue that this >> kind of "tyranny" should at least be tempered by additional considerations >> from long-time contributors who may also be acting more indirectly than is >> measured by a simple git log. >> > > I guess I am mi
Re: [Numpy-discussion] Governance model request
ctive developers, who make sure the project does not go off > the rails. I think you'd be an excellent and obvious trustee, in > that model. > I like the trustee model too and think such an addition to the NumPy concept would help alleviate my concerns about actually being on a "steering committee" but my preferred outcome is actually that the agreed upon steering council be smaller and that people who have a right to vote on things like the make-up of the steering committee be comprised of people who have been significantly involved in the past 3 years (not just the past one year). -Travis > > Cheers, > > Matthew > > > [1] http://www.bbc.com/news/technology-18419231 > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Governance model request
I wrote my recommendations quickly before heading on a plane.I hope the spirit of them was caught correctly.I also want to re-emphasize that I completely understand that the Steering Council is not to be making decisions that often and almost all activity will be similar to it is now --- discussion, debate, proposals, and pull-requests --- that is a good thing. However, there is a need for leadership to help unstick things and move the project forward from time to time because quite often doing *something* can be better than trying to please everyone with a voice. My concerns about how to do this judgment have 2 major components: 1) The need for long-term consistency --- a one-year horizon on defining this group is too short in my mind for a decades-old project like NumPy. 2) The group that helps unstick things needs to be small (1, 3, or 5 at the most) We could call this group the "adjudication group" rather than the "Steering Council" as well. I could see that having a formal method of changing that "adjudication group" would be a good idea as well (and perhaps that formal vote could be made by a vote of a group of active contributors. In that case, I would define active as having a time-window of 5 years instead of just 1). Thanks, -Travis On Mon, Sep 21, 2015 at 2:39 AM, Sebastian Berg <sebast...@sipsolutions.net> wrote: > On Mo, 2015-09-21 at 11:32 +0200, Sebastian Berg wrote: > > On So, 2015-09-20 at 11:20 -0700, Travis Oliphant wrote: > > > After long conversations at BIDS this weekend and after reading the > > > entire governance document, I realized that the steering council is > > > very large and I don't agree with the mechanism by which it is > > > chosen. > > > > > > > Hmmm, well I never had the impression that the steering council would be > > huge. But maybe you are right, and if it is, I could imagine something > > like option 2, but vote based (could possibly dual use those in charge > > of NumFOCUS relations, we had even discussed this possibility) which > > would have final say if necessary (could mean that the contributers > > definition could be broadened a bit). > > However, I am not sure this is what you suggested, because for me it > > should be a regular vote (if just because I am scared of having to make > > the right pick). And while I will not block this if others agree, I am > > currently not comfortable with either picking a BDFL (sorry guys :P) or > > very fond of an oligarchy for live. > > > > Anyway, I still don't claim to have a good grasp on these things, but > > without a vote, it seems a bit what Matthew warned about. > > > > One thing I could imagine is something like an "Advisory Board", without > > (much) formal power. If we had a voted Steering Council, it could be the > > former members + old time contributers which we would choose now. These > > could be invited to meetings at the very least. > > > > Just my current, probably not well thought out thoughts on the matter. > > But neither of your three options feel very obvious to me unfortunately. > > > > - Sebastian > > > > > > > A one year time frame is pretty short on the context of a two decades > > > old project and I believe the current council has too few people who > > > have been around the community long enough to help unstuck difficult > > > situations if that were necessary. > > > > > > I would recommend three possible adjustments to the steering council > > > concept. > > > > > > 1 - define a BDFL for the council. I would nominate chuck Harris > > > > > > 2 - limit the council to 3 people. I would nominate chuck, nathaniel, > > > and pauli. > > > > > > 3 - add me as a permanent member of the steering council. > > > > > Though, maybe you should be in the steering council in any case even by > the current rules. Maybe you were not too active for a while, but I > doubt you will quite stop doing stuff on numpy soon > > > > > Writing NumPy was a significant amount of work. I have been working > > > indirectly or directly in support of NumPy continously since I wrote > > > it. While I don't actively participate all the time, I still have a > > > lot of knowledge, context, and experience in how NumPy is used, why it > > > is the way it is, and how things could be better. I also work with > > > people directly who have and will contribute regularly. > > > > > > I am formally requesting that the steering council concept be adjusted > > > in one of these three ways. > > > > &
[Numpy-discussion] Governance model request
After long conversations at BIDS this weekend and after reading the entire governance document, I realized that the steering council is very large and I don't agree with the mechanism by which it is chosen. A one year time frame is pretty short on the context of a two decades old project and I believe the current council has too few people who have been around the community long enough to help unstuck difficult situations if that were necessary. I would recommend three possible adjustments to the steering council concept. 1 - define a BDFL for the council. I would nominate chuck Harris 2 - limit the council to 3 people. I would nominate chuck, nathaniel, and pauli. 3 - add me as a permanent member of the steering council. Writing NumPy was a significant amount of work. I have been working indirectly or directly in support of NumPy continously since I wrote it. While I don't actively participate all the time, I still have a lot of knowledge, context, and experience in how NumPy is used, why it is the way it is, and how things could be better. I also work with people directly who have and will contribute regularly. I am formally requesting that the steering council concept be adjusted in one of these three ways. Thanks, Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The process I intend to follow for any proposed changes to NumPy
Hey Chris (limiting to NumPy only), I've had some great conversations with Nathaniel in the past few days and I'm glad he posted his thoughts so that there is no confusion about governance or what I was implying. With respect to governance, I'm very supportive of what everyone is doing in organizing a governance document and approach and appreciate the effort of Nathaniel and others to move this forward. Nothing I said was meant to imply differently. I'm sorry if it made anyone nervous. I'm a very enthusiastic person when I get an idea of what to do. I like to see things implemented. In this case, it also turns out that in terms of overall architecture, my ideas are actually very similar to Nathaniel's ideas. That's a good sign.We have different tactical approaches as to how to move forward, but I think it's a good thing to note that we see a very similar path forward.Nothing will be done in NumPy itself except via pull-request and review. My approach for the ideas I'm pursuing will be to organize people around two new prototype packages I'm calling memtype and gufunc. The purpose of these is to allow playing with the design and ideas quickly before looking at how to put them into NumPy itself --- there will also be some training involved in getting people up to speed.There was a long discussion today at this BIDS data-structures for data-science summit part of which talked about how to improve NumPy's dtype system. I would love to these independent objects evolve into independent packages that could even go into Python standard library. Not everyone agrees that is the best idea, but regardless of whether this happens or not, the intent is to do work that could go into NumPy now. I look forward to the activity. -Travis On Mon, Sep 14, 2015 at 10:46 AM, Chris Barker <chris.bar...@noaa.gov> wrote: > Travis, > > I'm sure you appreciate that this might all look a bit scary, given the > recent discussion about numpy governance. > > But it's an open-source project, and I, at least, fully understand that > going through a big process is NOT the way to get a new idea tried out and > implemented. So I think think this is a great development -- I know I want > to see something like this dtype work done. > > So, as someone who has been around this community for a long time, and > dependent on Numeric, numarray, and numpy over the years, this looks like a > great development. > > And, in fact, with the new governance effort -- I think less scary -- > people can go off and work on a branch or fork, do good stuff, and we, as a > community, can be assured that API (or even ABI) changes won't be thrust > upon us unawares :-) > > As for the technical details -- I get a bit lost, not fully understanding > the current dtype system either, but do your ideas take us in the direction > of having dtypes independent of the container and ufunc machinery -- and > thus easier to create new dtypes (even in Python?) 'cause that would be > great. > > I hope you find the partner you're looking for -- that's a challenge! > > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > _______ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] The process I intend to follow for any proposed changes to NumPy
Hey all, I just wanted to clarify, that I am very excited about a few ideas I have --- but I don't have time myself to engage in the community process to get these changes into NumPy. However, those are real processes --- I've been coaching a few people in those processes for the past several years already. So, rather than do nothing, what I'm looking to do is to work with a few people who I can share my ideas with, get excited about the ideas, and then who will work with the community to get them implemented. That's what I was announcing and talking about yesterday --- looking for interested people who want to work on NumPy *with* the NumPy community. In my enthusiasm, I realize that some may have mis-understood my intention. There is no 'imminent' fork, nor am I planning on doing some crazy amount of work that I then try to force on other developers of NumPy. What I'm planning to do is find people to train on NumPy code base (people to increase the diversity of the developers would be ideal -- but hard to accomplish). I plan to train them on NumPy based on my experience, and on what I think should be done --- and then have *them* work through the community process and engage with others to get consensus (hopefully not losing too much in translation in the process --- but instead getting even better). During that process I will engage as a member of the community and help write NEPs and other documents and help clarify where it makes sense as I can. I will be filtering for people that actually want to see NumPy get better.Until I identify the people and work with them, it will be hard to tell how this will best work. So, stay tuned. If all goes well, what you should see in a few weeks time are specific proposals, a branch or two, and the beginnings of some pull requests.If you don't see that, then I will not have found the right people to help me, and we will all continue to go back to searching. While I'm expecting the best, in the worst case, we get additional people who know the NumPy code base and can help squash bugs as well as implement changes that are desired.Three things are needed if you want to participate in this: 1) A willingness to work with the open source community, 2) a deep knowledge of C and in-particular CPython's brand of C, and 3) a willingness to engage with me, do a mind-meld and dump around the NumPy code base, and then improve on what is in my head with the rest of the community. Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Just to this list --- more details of the approach
Hey all, To the NumPy list only, I'll at least give the highlights of the surgical approach I would like to get someone to work on -- I can help mentor and guide. These are just the highlights, but it should give someone familiar with the code the general gist. There are some details to work out, of course, but it could be done. It may be very similar to what Nathaniel is contemplating --- except I think breaking the ABI is the only way to really do this --- could be wrong but I'm not wiling to risk *not* just breaking the ABI. 1) Create a new meta-type in C (call it dtype) 2) Create Python Classes (in C) that are instances of this meta-type for each "kind" of data-type 3) Make PyArray_Descr * be a reference to one of these new objects (which can be built either in C or Python) and should be published outside NumPy as well. 4) Remove most of the "per-type function calls" in PyArray_ArrFuncs --- instead replacing those with the Generalized Ufunc equivalents and expand the capability of Generalized Ufuncs 5) Keep the Array Scalar Types but change them so that they also use the dtype meta-type as their foundation and mixin an array-methods type. Also, have these be in a separate project from NumPy itself. 6) The current void* would be replaced with real Python classes instead of structured arrays being shoved through a single data-type. 7) The documented ways to spell a dtype would be reduced --- but backwards compatibility would be preserved. 8) Make sure Numba can create these Descriptor objects with Ahead of Time Compilation and start to move code of NumPy to Numba 9) Ensure the Generalized Ufunc framework can take the data-type as an argument so that *all* data-types can participate in the general multi-method approach. There is more to it, but that is the basic idea.Please forgive me if I can't respond to any feedback from the list in a timely way. I will as I can. -Travis -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Looking for a developer who will work with me for at least 6 months to fix NumPy's dtype system.
Hi all, Apologies for cross-posting, but I need to get the word out and twitter doesn't provide enough explanation. I've been working on a second edition of my "Guide to NumPy" book. It's been a time-pressured activity, but it's helped me put more meat around my ideas for how to fix NumPy's dtype system -- which I've been contemplating off an on for 8 years. I'm pretty sure I know exactly how to do it --- in a way that fits more cleanly into Python. It will take 3-6 months and will have residual efforts needed that will last another 6 months --- making more types available with NumPy, improving calculations etc. This work will be done completely in public view and allow for public comment. It will not solve *all* of NumPy's problems, but it will put NumPy's dtype system on the footing it in retrospect should have been put on in the first place (if I had known then what I know now). It won't be a grandiose rewrite. It will be a pretty surgical fix to a few key places in the code. However, it will break the ABI and require recompilation of NumPy extensions (and so would need to be called NumPy 2.0). This is unavoidable, but I don't see any problem with breaking the ABI today given how easy it is to get distributions of Python these days from a variety of sources (including using conda --- but not only using conda). For those that remember what happened in Python dev land, the changes will be similar to when Guido changed Python 1.5.2 to Python 2.0. I can mentor and work closely with someone who will work on this and we will invite full participation and feedback from whomever in the community also wants to participate --- but I can't do it myself full time (and it needs someone full time+). Fortunately, I can pay someone to do it if they are willing to commit at least 6 months (it is not required to work at Continuum for this, but you can have a job at Continuum if you want one). I'm only looking for people who have enough experience with C or preferably the Python C-API. You also have to *want* to work on this. You need to be willing to work with me on the project directly and work to have a mind-meld with my ideas which will undoubtedly give rise to additional perspectives and ideas for later work for you. When I wrote NumPy 1.0, I put in 80+ hour weeks for about 6 months or more and then 60+ weeks for another year. I was pretty obsessed with it. This won't need quite that effort, but it will need something like it. Being able to move to Austin is a plus but not required. I can sponsor a visa for the right candidate as well (though it's not guaranteed you will get one with the immigration policies what they are). This is a labor of love for so many of us and my desire to help the dtype situation in NumPy comes from the same space that my desire to work on NumPy in the first place came. I will be interviewing people to work on this as not everyone who may want to will really be qualified to do it --- especially with so many people writing Cython these days instead of good-ole C-API code :-) Feel free to spread the news to anyone you can. I won't say more until I've found someone to work with me on this --- because I won't have the time to follow-up with any questions or comments.Even if I can't find someone I will publish the ideas --- but that also takes time and effort that is in short supply for me right now. If there is someone willing to fund this work, please let me know as well -- that could free up more of my time. Best, -Travis -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
as we both have the same ultimate interests in seeing array-computing in Python improve. I just don't support *major* changes without breaking the ABI without a whole lot of proof that it is possible (without hackiness). You have mentioned on your roadmap a lot of what I would consider *major* changes. Some of it you describe how to get there. The most important change (improving the dtype system) you don't. Part of my point is that we now *know* how to improve the dtype system. Let's do it. Let's not try yet again to do it differently inside an old system designed by a scientist who didn't understand type-theory or type systems (that was me by the way).Look at data-shape in the blaze project.Take that and build a Python type-system that also outputs struct-string syntax for memory-views. That's the data-description system that NumPy should be using --- not trying to hack on a mixed array-scalar, dtype-object system that may never support everything we now know is needed. Trying to incrementing from where we are now will only lead to a sub-optimal outcome and unfortunate instability when we already know what to do differently.I doubt I will convince you --- certainly not via email. I apologize in advance that I likely won't be able to respond in depth to any more questions that are really just prove to me that I can't kind of questions. Of course I can't prove that. All I'm saying is that to me the evidence and my experience leads me to not be able to support major changes like you have proposed without also intentionally breaking the ABI (and thus calling it NumPy 2.0). If I find time to write, I will try to use it to outline more specifically what I think is a better approach to array- and table-computing in Python that keeps the stability of NumPy and adds new features using different approaches. -Travis On Tue, Aug 25, 2015 at 12:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today.These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true. The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do.We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI).Those problems are still there.Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base. It is more important to not break people's
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. snip I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy.It will be exciting to see what the next few years bring as well. I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project? I think it would be best if Mark Wiebe speaks up here. I can explain why Continuum supported DyND with some fraction of Mark's time for a few years and give my perspective, but ultimately DyND is Mark's story to tell (and a few talented people have now joined him in the effort). Mark Wiebe was a productive NumPy developer. He was one of a few people that jumped in on the code-base and made substantial and significant changes and came to understand just how hard it can be to develop in the NumPy code-base. He also is a C++ developer who really likes the beauty and power of that language (which definitely biases his NumPy work, but he did put a lot of effort into making NumPy better). Before Peter and I started Continuum, Mark had begun the DyND project as an example of a general-purpose dynamic array library that could be used by any dynamic language to make arrays. In the early days of Continuum, we spent time from at least Mark W, Bryan Van de Ven, Jay Borque, and Francesc Alted looking at how to extend NumPy to add 1) categorical data-types, 2) variable-length strings, and 3) better date-time types.Bryan, a good developer, who has gone on to be a primary developer of Bokeh spent quite a bit of time and had a prototype of categoricals *nearly* working. He did not like working on the NumPy code-base at all. He struggled with it and found it very difficult to extend.He worked closely with Mark Wiebe who helped him the best he could. What took him 4 weeks in NumPy took him 3 days in DyND to build. I think that experience, convinced him and Mark W both that working with NumPy code-base would take too long to make significant progress. Also, during 2012 I was trying to help with release-management (though I ended up just hiring Ondrej Certek to actually do the work and he did a great job of getting a release of NumPy out the door --- thanks to much help from many of you).At that point, I realized very clearly, that what I could best do at this point was to try and get more resources for open source and for the NumPy stack rather than work on the code directly. We also did work with several clients that helped me realize just how many disruptive changes had happened from 1.4 to 1.7 for extensive users of NumPy (much more than would be justified from a we don't break the ABI mantra that was the stated goal). We also realized that the kind of experimentation we wanted to do in the first 2 years of Continuum would just not be possible on the NumPy code-base and the need for getting community buy-in on every decision would slow us down too much --- as we had to iterate rapidly on so many things and find our center as a startup. It also would not be fair to the NumPy community. Our decision to do *all* of our exploration outside the NumPy code base was basically 1) the kinds of changes we wanted ultimately were potentially dramatic and disruptive, 2) it would be too difficult and time-consuming to decide all things in public discussions with the NumPy community --- especially when some things were experimental 3) tying ourselves to releases of NumPy would be difficult at that time, and 4) the design of the NumPy code-base makes it difficult to contribute to --- both Mark W and Bryan V felt they could make progress *much* faster in a new code-base. Continuum did not have enough start-up funding to devote significant time on DyND in the early days.So Mark rallied what resources he could and we supported him the best we could and he made progress. My only real requirement with sponsoring his work when we did
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant tra...@continuum.io wrote: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. snip There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths).I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a back-ward compatible API that is also available. Semantic compatibility is the hardest. We have already broken this on multiple occasions throughout the 1.x NumPy releases. Every time you change the code, this can change.This is what I fear causing deep instability over the course of many years. These are things like the casting rule details, the effect of indexing changes, any change to the calculations approaches. It is and has been the most at risk during any code-changes.My view is that a NumPy 2.0 (with a new low-level architecture) minimizes these changes to a single release rather than unavoidably spreading them out over many, many releases. I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy.It will be exciting to see what the next few years bring as well. I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project? Thanks Chuck. I'll do this in a separate email, but I just wanted to point out that when I say NumPy 2.0, I'm actually only specifically talking about a release of NumPy that breaks ABI compatibility --- not some potential re-write. I'm not ruling that out, but I'm not necessarily implying such a thing by saying NumPy 2.0. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015
rewriting the world. Obviously there are still a lot of details to work out, though. But overall, there was widespread agreement that this is one of the #1 pain points for our users (e.g. it's the single main request from pandas), and fixing it is very high priority. Some features that would become straightforward to implement (e.g. even in third-party libraries) if this were fixed: - missing value support - physical unit tracking (meters / seconds - array of velocity; meters + seconds - error) - better and more diverse datetime representations (e.g. datetimes with attached timezones, or using funky geophysical or astronomical calendars) - categorical data - variable length strings - strings-with-encodings (e.g. latin1) - forward mode automatic differentiation (write a function that computes f(x) where x is an array of float64; pass that function an array with a special dtype and get out both f(x) and f'(x)) - probably others I'm forgetting right now I should also note that there was one substantial objection to this plan, from Travis Oliphant (in discussions later in the conference). I'm not confident I understand his objections well enough to reproduce them here, though -- perhaps he'll elaborate. Money = There was an extensive discussion on the topic of: if we had money, what would we do with it? This is partially motivated by the realization that there are a number of sources that we could probably get money from, if we had a good story for what we wanted to do, so it's not just an idle question. Points of general agreement: - Doing the in-person meeting was a good thing. We should plan do that again, at least once a year. So one thing to spend money on is travel subsidies to make sure that happens and is productive. - While it's tempting to imagine hiring junior people for the more frustrating/boring work like maintaining buildbots, release infrastructure, updating docs, etc., this seems difficult to do realistically with our current resources -- how do we hire for this, who would manage them, etc.? - On the other hand, the general feeling was that if we found the money to hire a few more senior people who could take care of themselves more, then that would be good and we could realistically absorb that extra work without totally unbalancing the project. - A major open question is how we would recruit someone for a position like this, since apparently all the obvious candidates who are already active on the NumPy team already have other things going on. [For calibration on how hard this can be: NYU has apparently had an open position for a year with the job description of come work at NYU full-time with a private-industry-competitive-salary on whatever your personal open-source scientific project is (!) and still is having an extremely difficult time filling it: [http://cds.nyu.edu/research-engineer/]] - General consensus though was that there isn't much to be done about this though, except try it and see. - (By the way, if you're someone who's reading this and potentially interested in like a postdoc or better working on numpy, then let's talk...) More specific changes to numpy that had general consensus, but don't really fit into a high-level roadmap = - Resolved: we should merge multiarray.so and umath.so into a single extension module, so that they can share utility code without the current awkward contortions. - Resolved: we should start hiding new fields in the ufunc and dtype structs as soon as possible going forward. (I.e. they would not be present in the version of the structs that are exposed through the C API, but internally we would use a more detailed struct.) - Mayybe we should even go ahead and hide the subset of the existing fields that are really internal details that no-one should be using. If we did this without changing anything else then it would preserve ABI (the fields would still be where existing compiled extensions expect them to be, if any such extensions exist) while breaking API (trying to compile such extensions would give a clear error), so would be a smoother ramp if we think we need to eventually break those fields for real. (As discussed above, there are a bunch of fields in the dtype base class that only make sense for specific dtype subclasses, e.g. only record dtypes need a list of field names, but right now all dtypes have one anyway. So it would be nice to remove these from the base class entirely, but that is potentially ABI-breaking.) - Resolved
Re: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X
Only on Windows does free Anaconda link against the MKL. But, you are correct, that the MKL-linked binaries can only be re-distributed if the person or entity doing the re-distribution has a valid MKL license from Intel. Microsoft has actually released their Visual Studio 2008 compiler stack so that OpenBLAS and ATLAS could be compiled on Windows for these platforms as well. I would be very interested to see conda packages for these libraries which should be pretty straightforward to build. -Travis On Wed, Oct 8, 2014 at 1:12 PM, Carl Kleffner cmkleff...@gmail.com wrote: Hi Travis, the Anaconda binaries (free packages as well as the non-free addons) link against Intel MKL - not against ATLAS. Are this binaries really free redistributable as stated? The lack of numpy/scipy 64bit windows binaries with opensource blas/lapack with was one of the main reasons to start with the development of a dedicated mingw-w64 based compiler toolchain to support OpenBLAS / ATLAS based binaries on windows. Cheers, carlkl 2014-10-08 1:32 GMT+02:00 Travis Oliphant tra...@continuum.io: Hey Andrew, You can use any of the binaries from Anaconda and redistribute them as long as you cite Anaconda --- i.e. tell your users that they are using Anaconda-derived binaries. The Anaconda binaries link against ATLAS. The binaries are all at http://repo.continuum.io/pkgs/ In case you weren't aware: Another way you can build and distribute an application is to build a 'conda' meta-package which lists all the dependencies. If you add to this meta-package 1) an icon and 2) an entry-point, then your application will automatically show up in the Anaconda Launcher (see this blog-post: http://www.continuum.io/blog/new-launcher ) and anyone with the Anaconda Launcher app can install/update your package by clicking on the icon next to it. Users can also install your package with conda install or using the conda-gui. Best, -Travis On Mon, Oct 6, 2014 at 11:54 AM, Andrew Collette andrew.colle...@gmail.com wrote: Hi all, I am working with the HDF Group on a new open-source viewer program for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, since people don't typically have Python installed, we are looking to distribute the application using PyInstaller, which embeds dependencies like NumPy. Likewise for OS X (using Py2App). We would like to make sure we don't accidentally include non-open-source components... I recall there was some discussion here about using the Intel math libraries for binary releases on various platforms. Do the releases on SourceForge or PyPI use any proprietary code? We'd like to avoid building NumPy ourselves if we can avoid it. Apologies if this is explained somewhere, but I couldn't find it. Thanks! Andrew Collette ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X
Ah, yes, I hadn't realized that OpenBLAS could not be compiled with Visual Studio. Thanks for that explanation. Also, I had heard that 32bit mingw on Windows could still produce 64-bit binaries. It looks like there are OpenBLAS binaries available for Windows 32 and Windows 64 (two flavors). It should be straightforward to take those binaries and make conda (or wheel) packages out of them. A good mingw64 stack for Windows would be great and benefits many communities. On Wed, Oct 8, 2014 at 4:46 PM, Sturla Molden sturla.mol...@gmail.com wrote: Travis Oliphant tra...@continuum.io wrote: Microsoft has actually released their Visual Studio 2008 compiler stack so that OpenBLAS and ATLAS could be compiled on Windows for these platforms as well. I would be very interested to see conda packages for these libraries which should be pretty straightforward to build. OpenBLAS does not compile with Microsoft compilers because of ATT assembly syntax. You need to use a GNU compiler and you also need to have a GNU environment. OpenBLAS is easy to build on Windows with MinGW (with gfortran) and MSYS. Carl's toolchain ensures that the binaries are compatible with the Python binaries from Python.org. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X
Hey Andrew, You can use any of the binaries from Anaconda and redistribute them as long as you cite Anaconda --- i.e. tell your users that they are using Anaconda-derived binaries. The Anaconda binaries link against ATLAS. The binaries are all at http://repo.continuum.io/pkgs/ In case you weren't aware: Another way you can build and distribute an application is to build a 'conda' meta-package which lists all the dependencies. If you add to this meta-package 1) an icon and 2) an entry-point, then your application will automatically show up in the Anaconda Launcher (see this blog-post: http://www.continuum.io/blog/new-launcher ) and anyone with the Anaconda Launcher app can install/update your package by clicking on the icon next to it. Users can also install your package with conda install or using the conda-gui. Best, -Travis On Mon, Oct 6, 2014 at 11:54 AM, Andrew Collette andrew.colle...@gmail.com wrote: Hi all, I am working with the HDF Group on a new open-source viewer program for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, since people don't typically have Python installed, we are looking to distribute the application using PyInstaller, which embeds dependencies like NumPy. Likewise for OS X (using Py2App). We would like to make sure we don't accidentally include non-open-source components... I recall there was some discussion here about using the Intel math libraries for binary releases on various platforms. Do the releases on SourceForge or PyPI use any proprietary code? We'd like to avoid building NumPy ourselves if we can avoid it. Apologies if this is explained somewhere, but I couldn't find it. Thanks! Andrew Collette ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type
This could actually be done by using the structured dtype pretty easily. The hard work would be improving the ufunc and generalized ufunc mechanism to handle structured data-types. Numba actually provides some of this already, so if you have NumPy + Numba you can do this sort of thing now. -Travis On Wed, Sep 24, 2014 at 12:08 PM, Chris Barker chris.bar...@noaa.gov wrote: On Tue, Sep 23, 2014 at 4:40 AM, Eric Moore e...@redtetrahedron.org wrote: Improving the dtype system requires working on c code. yes -- it sure does. But I think that is a bit of a Red Herring. I'm barely competent in C, and don't like it much, but the real barrier to entry for me is not that it's in C, but that it's really complex and hard to hack on, as it wasn't designed to support custom dtypes, etc. from the start. There is a lot of ugly code in there that has been hacked in to support various functionality over time. If there was a clean dtype-extension system in C, then A) it wouldn't be bad C to write, and B) would be pretty easy to make a Cython-wrapped version. Travis gave a nice vision for the future, but in the meantime, I'm wondering: Could we hack in a generic custom dtype dtype object into the current system that would delegate everything to the dtype object -- in a truly object-oriented way. I'm imagining that this custom dtype object would be a pyObject and thus very hackable, easy to make a new subclass, etc -- essentially like making a new class in python that emulates one of the built-in type interfaces. This would be slow as a dog -- if inside that C loop, numpy would have to call out to python to do anyting, maybe as simple as arithmetic, but it would be clean, extensible system, and a good way for folks to plug in and try out new dtypes when performance didn't matter, or as prototypes for something that would get plugged in at the C level later once the API was worked out. Is this even possible without too much hacking to the current dtype system? Would it be as simple as adding a bit to the object dtype? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type
On Sun, Sep 21, 2014 at 6:50 PM, Stephan Hoyer sho...@gmail.com wrote: pandas has some hacks to support custom types of data for which numpy can't handle well enough or at all. Examples include datetime and Categorical [1], and others like GeoArray [2] that haven't make it into pandas yet. Most of these look like numpy arrays but with custom dtypes and type specific methods/properties. But clearly nobody is particularly excited about writing the the C necessary to implement custom dtypes [3]. Nor is do we need the ndarray ABI. In many cases, writing C may not actually even be necessary for performance reasons, e.g., categorical can be fast enough just by wrapping an integer ndarray for the internal storage and using vectorized operations. And even if it is necessary, I think we'd all rather write Cython than C. It's great for pandas to write its own ndarray-like wrappers (*not* subclasses) that work with pandas, but it's a shame that there isn't a standard interface like the ndarray to make these arrays useable for the rest of the scientific Python ecosystem. For example, pandas has loads of fixes for np.datetime64, but nobody seems to be up for porting them to numpy (I doubt it would be easy). I know these sort of concerns are not new, but I wish I had a sense of what the solution looks like. Is anyone actively working on these issues? Does the fix belong in numpy, pandas, blaze or a new project? I'd love to get a sense of where things stand and how I could help -- without writing any C :). Hey Stephan, There are not easy answers to your questions. The reason is that NumPy's dtype system is not extensible enough with its fixed set of builtin data-types and its bolted-on user-defined datatypes. The implementation was adapted from the *descriptor* notion that was in Numeric (written almost 20 years ago). While a significant improvement over Numeric, the dtype system in NumPy still has several limitations: 1) it was not designed to add new fundamental data-types without breaking the ABI (most of the ABI breakage between 1.3 and 1.7 due to the addition of np.datetime has been pushed to a small corner but it is still there). 2) The user-defined data-type system which is present is not well tested and likely incomplete: it was the best I could come up with at the time NumPy first came out with a bit of input from people like Fernando Perez and Francesc Alted. 3) It is far easier than in Numeric to add new data-types (that was a big part of the effort of NumPy), but it is still not as easy as one would like to add new data-types (either fundamental ones requiring recompilation of NumPy or 'user-defined' data-types requiring C-code. I believe this system has served us well, but it needs to be replaced eventually. I think it can be replaced fairly seamlessly in a largely backward compatible way (though requiring re-compilation of dependencies). Fixing the dtype system is a fundamental effort behind several projects we are working on at Continuum: datashape, dynd, and numba.These projects are addressing fundamental limitations in a way that can lead to a significantly improved framework for scientific and tabular computing in Python. In the mean-time, NumPy can continue to improve in small ways and in orthogonal ways (like the new __numpy_ufunc__ mechanism which allows ufuncs to work more seamlessly with different kinds of array-like objects). This kind of effort as well as the improved buffer protocol in Python, mean that multiple array-like objects can co-exist and use each-other's data. Right now, I think that is the best current way to address the data-type limitations of NumPy. Another small project is possible today --- one could today use Numba or Cython to generate user-defined data-types for existing NumPy. That would be an interesting project and would certainly help to understand the limitations of the user-defined data-type framework without making people write C-code. You could use a meta-class and some code-generation techniques so that by defining a particular class you end-up with a user-defined data-type for NumPy. Even while we have been addressing the fundamental limitations of NumPy with our new tools at Continuum, replacing NumPy is a big undertaking because of its large user-base. While I personally think that NumPy could be replaced for new users as early as next year with a combination of dynd and numba, the big install base of NumPy means that many people (including the company I work with, Continuum) will be supporting NumPy 1.X and Pandas and the rest of the NumPy-Stack for many years to come. So, even if you see me working and advocating new technology, that should never be construed as somehow ignoring or abandoning the current technology base. I remain deeply interested in the success of the scientific computing community --- even though I am not currently contributing a lot of code directly myself.As
Re: [Numpy-discussion] SciPy 2014 BoF NumPy Participation
projects can use as a data-type-description mini-language: https://github.com/ContinuumIO/datashape I think that a really good project for an enterprising young graduate student, post-doc, or professor (who is willing to delay their PhD or risk their tenure) would be to re-write the ufunc system using more modern techniques and put generalized ufuncs front and center as Nathaniel described. It sounds like many agree that we can improve the ufunc object implementation.A new ufunc system is an entirely achievable goal and could even be shipped as an add-on project external from NumPy for several years before being adopted fully.I know at least 4 people with demo-ware versions of a new ufunc-object that could easily replace current NumPy ufuncs eventually.If you are interested in that, I would love to share what I know with you. After spending quite a bit of time thinking about this over the past 2 years, interacting with many in the user community outside of this list, and working with people as they explore a few options --- I do have a fair set of opinions. But, there are also a lot of possibilities and many opportunities. I'm looking forward to seeing what emerges in the coming months and years and cooperating where possible with others having overlapping interests. Best, -Travis On Tue, Jun 3, 2014 at 6:08 PM, Kyle Mandli kyle.man...@gmail.com wrote: Hello everyone, As one of the co-chairs in charge of organizing the birds-of-a-feather sesssions at the SciPy conference this year, I wanted to solicit through the NumPy list to see if we could get enough interest to hold a NumPy centered BoF this year. The BoF format would be up to those who would lead the discussion, a couple of ideas used in the past include picking out a few of the lead devs to be on a panel and have a QA type of session or an open QA with perhaps audience guided list of topics. I can help facilitate organization of something but we would really like to get something organized this year (last year NumPy was the only major project that was not really represented in the BoF sessions). Thanks! Kyle Manldi (and via proxy Matt McCormick) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)
Believe me, I'm all for incremental changes if it is actually possible and doesn't actually cost more. It's also why I've been silent until now about anything we are doing being a candidate for a NumPy 2.0. I understand the challenges of getting people to change. But, features and solid improvements *will* get people to change --- especially if their new library can be used along with the old library and the transition can be done gradually. Python 3's struggle is the lack of features. At some point there *will* be a NumPy 2.0. What features go into NumPy 2.0, how much backward compatibility is provided, and how much porting is needed to move your code from NumPy 1.X to NumPy 2.X is the real user question --- not whether it is characterized as incremental change or re-write. What I call a re-write and what you call an incremental-change are two points on a spectrum and likely overlap signficantly if we really compared what we are thinking about. One huge benefit that came out of the numeric / numarray / numpy transition that we mustn't forget about was actually the extended buffer protocol and memory view objects. This really does allow multiple array objects to co-exist and libraries to use the object that they prefer in a way that did not exist when Numarray / numeric / numpy came out.So, we shouldn't be afraid of that world. The existence of easy package managers to update environments to try out new features and have applications on a single system that use multiple versions of the same library is also something that didn't exist before and that will make any transition easier for users. One thing I regret about my working on NumPy originally is that I didn't have the foresight, skill, and understanding to work more on a more extended and better designed multiple-dispatch system so that multiple array objects could participate together in an expression flow. The __numpy_ufunc__ mechanism gives enough capability in that direction that it may be better now. Ultimately, I don't disagree that NumPy can continue to exist in incremental change mode ( though if you are swapping out whole swaths of C-code for Cython code --- it sounds a lot like a re-write) as long as there are people willing to put the effort into changing it. I think this is actually benefited by the existence of other array objects that are pushing the feature envelope without the constraints --- in much the same way that the Python standard library is benefitted by many versions of different capabilities being tried out before moving into the standard library. I remain optimistic that things will continue to improve in multiple ways --- if a little messier than any of us would conceive individually. It *is* great to see all the PR's coming from multiple people on NumPy and all the new energy around improving things whether great or small. Best, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed
Congratulations! This is definitely a big step for array-computing with Python. Working with the Python devs to implement a PEP can be a tremendous opportunity to increase your programming awareness and ability --- as well as make some good friends. This is a great way to get involved with both Python and the NumPy community and have a big impact. If you are in a position to devote several hours a week to the task, then you won't find a better opportunity to contribute. Best, -Travis On Apr 7, 2014 6:24 PM, Nathaniel Smith n...@pobox.com wrote: Hey all, Guido just formally accepted PEP 465: https://mail.python.org/pipermail/python-dev/2014-April/133819.html http://legacy.python.org/dev/peps/pep-0465/#implementation-details Yay. The next step is to implement it, in CPython and in numpy. I have time to advise on this, but not to do it myself, so, any volunteers? Ever wanted to hack on the interpreter itself, with BDFL guarantee your patch will be accepted (if correct)? The todo list for CPython is here: http://legacy.python.org/dev/peps/pep-0465/#implementation-details There's one open question which is where the type slots should be added. I'd just add them to PyNumberMethods and then if someone objects during patch review it can be changed. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator
Congratulations Nathaniel! This is great news! Well done on starting the process and taking things forward. Travis On Mar 14, 2014 7:51 PM, Nathaniel Smith n...@pobox.com wrote: Well, that was fast. Guido says he'll accept the addition of '@' as an infix operator for matrix multiplication, once some details are ironed out: https://mail.python.org/pipermail/python-ideas/2014-March/027109.html http://legacy.python.org/dev/peps/pep-0465/ Specifically, we need to figure out whether we want to make an argument for a matrix power operator (@@), and what precedence/associativity we want '@' to have. I'll post two separate threads to get feedback on those in an organized way -- this is just a heads-up. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Indexing changes in 1.9
Hey Sebastien, I didn't mean to imply that you would need to necessarily work on it. But the work Jay has done could use review. There are also conversations to have about what to do to resolve the ambiguity that led to the current behavior. Thank you or all the great work on the indexing code paths. Is their a roadmap for 1.9? Travis On Feb 3, 2014 1:26 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Sun, 2014-02-02 at 13:11 -0600, Travis Oliphant wrote: This sounds like a great and welcome work and improvements. Does it make sense to also do something about the behavior of advanced indexing when slices are interleaved between lists and integers. I know that jay borque has some preliminary work to fix this. There are a some straightforward fixes -- like doing iterative application of indexing in those cases which would be more sensical in the cases where current code gets tripped up. I guess you are talking about the funky transposing logic and maybe the advanced indexing logic as such? I didn't really think about changing any of that, not sure if we easily can? Personally, I always wondered if it would make sense to add some new type of indexing mechanism to switch to R/matlab style non-advanced integer-array indexing. I don't think this will make it substantially easier to do (the basic logic remains the same -- we need an extra/different preparation and then transpose the result differently), though it might be a bit more obvious where/how to plug it in. But it seems very unlikely I will look into that in the near future (but if someone wants hints on how to go about it, just ask). - Sebastian Travis On Feb 2, 2014 11:07 AM, Charles R Harris charlesr.har...@gmail.com wrote: Sebastian has done a lot of work to refactor/rationalize numpy indexing. The changes are extensive enough that it would be good to have more public review, so here is the release note. The NumPy indexing has seen a complete rewrite in this version. This makes most advanced integer indexing operations much faster and should have no other implications. However some subtle changes and deprecations were introduced in advanced indexing operations: * Boolean indexing into scalar arrays will always return a new 1-d array. This means that ``array(1)[array(True)]`` gives ``array([1])`` and not the original array. * Advanced indexing into one dimensional arrays used to have (undocumented) special handling regarding repeating the value array in assignments when the shape of the value array was too small or did not match. Code using this will raise an error. For compatibility you can use ``arr.flat[index] = values``, which uses the old code branch. * The iteration order over advanced indexes used to be always C-order. In NumPy 1.9. the iteration order adapts to the inputs and is not guaranteed (with the exception of a *single* advanced index which is never reversed for compatibility reasons). This means that the result is undefined if multiple values are assigned to the same element. An example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may set ``arr[0, 1]`` to either 1 or 2. * Equivalent to the iteration order, the memory layout of the advanced indexing result is adapted for faster indexing and cannot be predicted. * All indexing operations return a view or a copy. No indexing operation will return the original array object. * In the future Boolean array-likes (such as lists of python bools) will always be treated as Boolean indexes and Boolean scalars (including python `True`) will be a legal *boolean* index. At this time, this is already the case for scalar arrays to allow the general ``positive = a[a 0]`` to work when ``a`` is zero dimensional. * In NumPy 1.8 it was possible to use `array(True)` and `array(False)` equivalent to 1 and 0 if the result
Re: [Numpy-discussion] Indexing changes in 1.9
This sounds like a great and welcome work and improvements. Does it make sense to also do something about the behavior of advanced indexing when slices are interleaved between lists and integers. I know that jay borque has some preliminary work to fix this. There are a some straightforward fixes -- like doing iterative application of indexing in those cases which would be more sensical in the cases where current code gets tripped up. Travis On Feb 2, 2014 11:07 AM, Charles R Harris charlesr.har...@gmail.com wrote: Sebastian has done a lot of work to refactor/rationalize numpy indexing. The changes are extensive enough that it would be good to have more public review, so here is the release note. The NumPy indexing has seen a complete rewrite in this version. This makes most advanced integer indexing operations much faster and should have no other implications. However some subtle changes and deprecations were introduced in advanced indexing operations: * Boolean indexing into scalar arrays will always return a new 1-d array. This means that ``array(1)[array(True)]`` gives ``array([1])`` and not the original array. * Advanced indexing into one dimensional arrays used to have (undocumented) special handling regarding repeating the value array in assignments when the shape of the value array was too small or did not match. Code using this will raise an error. For compatibility you can use ``arr.flat[index] = values``, which uses the old code branch. * The iteration order over advanced indexes used to be always C-order. In NumPy 1.9. the iteration order adapts to the inputs and is not guaranteed (with the exception of a *single* advanced index which is never reversed for compatibility reasons). This means that the result is undefined if multiple values are assigned to the same element. An example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may set ``arr[0, 1]`` to either 1 or 2. * Equivalent to the iteration order, the memory layout of the advanced indexing result is adapted for faster indexing and cannot be predicted. * All indexing operations return a view or a copy. No indexing operation will return the original array object. * In the future Boolean array-likes (such as lists of python bools) will always be treated as Boolean indexes and Boolean scalars (including python `True`) will be a legal *boolean* index. At this time, this is already the case for scalar arrays to allow the general ``positive = a[a 0]`` to work when ``a`` is zero dimensional. * In NumPy 1.8 it was possible to use `array(True)` and `array(False)` equivalent to 1 and 0 if the result of the operation was a scalar. This will raise an error in NumPy 1.9 and, as noted above, treated as a boolean index in the future. * All non-integer array-likes are deprecated, object arrays of custom integer like objects may have to be cast explicitly. * The error reporting for advanced indexing is more informative, however the error type has changed in some cases. (Broadcasting errors of indexing arrays are reported as `IndexError`) * Indexing with more then one ellipsis (`...`) is deprecated. Thoughts? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ufunc overrides
Hey Blake, To be clear, my blog-post is just a pre-NEP and should not be perceived as something that will transpire in NumPy anytime soon.You should take it as a hey everyone, I think I know how to solve this problem, but I have no time to do it, but wanted to get the word out to those who might have the time I think the multi-method approach I outline is the *right* thing to do for NumPy. Another attribute on ufuncs would be a bit of a hack (though easier to implement). But, on the other-hand, the current ufunc attributes are also a bit of a hack. While my overall proposal is to make *all* functions in NumPy (and SciPy and Scikits) multimethods, I think it's actually pretty straightforward and a more contained problem to make all *ufuncs* multi-methods. I think that could fit in a summer of code project. I don't think it would be that difficult to make all ufuncs multi-methods that dispatch based on the Python type (they are already multi-methods based on the array dtype).You could basically take the code from Guido's essay or from Peak Rules multi-method implementation or from the links below and integrate it with a wrapped version of the current ufuncs (or do a bit more glue and modify the ufunc_call function in 'C' directly and get nice general multi-methods for ufuncs). Of course, you would need to define a decorator that NumPy users could use to register their multi-method implementation with the ufunc. But, this again would not be too difficult. Look for examples and inspiration at the following places: http://alexgaynor.net/2010/jun/26/multimethods-python/ https://pypi.python.org/pypi/typed.py I really think this would be a great addition to NumPy (it would simplify a lot of cruft around masked arrays, character arrays, etc.) and be quite useful. I wish you the best. I can't promise I will have time to help, but I will try to chime in the best I can. Best regards, -Travis On Wed, Jul 10, 2013 at 10:29 PM, Blake Griffith blake.a.griff...@gmail.com wrote: Hello NumPy, Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's ufuncs. Currently there is no feasible way to do this without changing ufuncs a bit. I've been considering a mechanism to override ufuncs based on checking the ufuncs arguments for a __ufunc_override__ attribute. Then handing off the operation to a function specified by that attribute. I prototyped this in python and did a demo in a blog post here: http://cwl.cx/posts/week-6-ufunc-overrides.html This is similar to a previously discussed, but never implemented change: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html However it seems like the ufunc machinery might be ripped out and replaced with a true multi-method implementation soon. See Travis' blog post: http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html So I'd like to make my changes as forward compatible as possible. However I'm not sure what I should even consider here, or how forward compatible my current implementation is. Thoughts? Until then, I'm writing up a nep, it is still pretty incomplete, it can be found here: https://github.com/cowlicks/numpy/blob/ufunc-override/doc/neps/ufunc-overrides.rst ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Travis Oliphant Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] timezones and datetime64
Mark Wiebe and I are both still tracking NumPy development and can provide context and even help when needed.Apologies if we've left a different impression. We have to be prudent about the time we spend as we have other projects we are pursuing as well, but we help clients with NumPy issues all the time and are eager to continue to improve the code base. It seems to me that the biggest issue is just the automatic conversion that is occurring on string or date-time input. We should stop using the local time-zone (explicit is better than implicit strikes again) and not use any time-zone unless time-zone information is provided in the string. I am definitely +1 on that. It may be necessary to carry around another flag in the data-type to indicate whether or not the date-time is naive (not time-zone aware) or time-zone aware so that string printing does not print a time-zone if it didn't have one to begin with as well. If others agree that this is the best way forward, then Mark or I can definitely help contribute a patch. Best, -Travis On Wed, Apr 3, 2013 at 9:38 AM, Dave Hirschfeld dave.hirschf...@gmail.comwrote: Nathaniel Smith njs at pobox.com writes: On Wed, Apr 3, 2013 at 2:26 PM, Dave Hirschfeld dave.hirschfeld at gmail.com wrote: This isn't acceptable for my use case (in a multinational company) and I found no reasonable way around it other than bypassing the numpy conversion entirely by setting the dtype to object, manually parsing the strings and creating an array from the list of datetime objects. Wow, that's truly broken. I'm sorry. I'm skeptical that just switching to UTC everywhere is actually the right solution. It smells like one of those solutions that's simple, neat, and wrong. (I don't know anything about calendar-time series handling, so I have no ability to actually judge this stuff, but wouldn't one problem be if you want to know about business days/hours? You lose the original day-of-year once you move everything to UTC.) Maybe datetime dtypes should be parametrized by both granularity and timezone? Or we could just declare that datetime64 is always timezone-naive and adjust the code to match? I'll CC the pandas list in case they have some insight. Unfortunately AFAIK no-one who's regularly working on numpy this point works with datetimes, so we have limited ability to judge solutions... please help! -n It think simply setting the timezone to UTC if it's not specified would solve 99% of use cases because IIUC the internal representation is UTC so numpy would be doing no conversion of the dates that were passed in. It was the conversion which was the source of the error in my example. The only potential issue with this is that the dates might take along an incorrect UTC timezone, making it more difficult to work with naive datetimes. e.g. In [42]: d = np.datetime64('2014-01-01 00:00:00', dtype='M8[ns]') In [43]: d Out[43]: numpy.datetime64('2014-01-01T00:00:00+') In [44]: str(d) Out[44]: '2014-01-01T00:00:00+' In [45]: pydate(str(d)) Out[45]: datetime.datetime(2014, 1, 1, 0, 0, tzinfo=tzutc()) In [46]: pydate(str(d)) == datetime.datetime(2014, 1, 1) Traceback (most recent call last): File ipython-input-46-abfc0fee9b97, line 1, in module pydate(str(d)) == datetime.datetime(2014, 1, 1) TypeError: can't compare offset-naive and offset-aware datetimes In [47]: pydate(str(d)) == datetime.datetime(2014, 1, 1, tzinfo=tzutc()) Out[47]: True In [48]: pydate(str(d)).replace(tzinfo=None) == datetime.datetime(2014, 1, 1) Out[48]: True In this case it may be best to have numpy not try to set the timezone at all if none was specified. Given the internal representation is UTC I'm not sure this is feasible though so defaulting to UTC may be the best solution. Regards, Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- --- Travis Oliphant Continuum Analytics, Inc. http://www.continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A new webpage promoting Compiler technology for CPython
We should take this discussion off list. Please email me directly if you have questions. But, we are open to listing all of these tools. On Feb 16, 2013 10:46 AM, Massimo DiPierro massimo.dipie...@gmail.com wrote: Thank you. Should this be listed: https://github.com/mdipierro/ocl ? It is based on meta (which is listed) and pyopencl (which is listed, only used to run with opencl) and has some overlap with Cython and Pyjamas although it is not based on any of them. It is minimalist in scope: it only coverts to C/JS/OpenCL a common subset of those languages. But it does what it advertises. It is written in pure python and implemented and implemented in a single file. Massimo On Feb 16, 2013, at 10:13 AM, Ronan Lamy wrote: Le 16/02/2013 16:08, Massimo DiPierro a écrit : Sorry for injecting... Which page is this about? http://compilers.pydata.org/ Cf. the post I answered to. On Feb 16, 2013, at 9:59 AM, Ronan Lamy wrote: Le 15/02/2013 07:11, Travis Oliphant a écrit : This page is specifically for Compiler projects that either integrate with or work directly with the CPython run-time which is why PyPy is not presently listed. The PyPy project is a great project but we just felt that we wanted to explicitly create a collection of links to compilation projects that are accessible from CPython which are likely less well known. I won't argue here with the exclusion of PyPy, but RPython is definitely compiler technology that runs on CPython 2.6/2.7. For now, it is only accessible from a source checkout of PyPy but that will soon change and pip install rpython isn't far off. Since it's a whole tool chain, it has a wealth of functionalities, though they aren't always well-documented or easy to access from the outside: bytecode analysis, type inference, several GC implementations, a JIT generator, assemblers for several architectures, ... Cheers, Ronan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A new webpage promoting Compiler technology for CPython
I only meant off the NumPy list as it seems this is off-topic for this forum. I thought I made clear in the rest of the paragraph that we would *love* this contribution. I recommend a pull request. If you want to discuss this in public. Let's have the discussion over at numfo...@googlegroups.com until a more specific list is created. On Sat, Feb 16, 2013 at 6:14 PM, Fernando Perez fperez@gmail.comwrote: On Sat, Feb 16, 2013 at 3:56 PM, Travis Oliphant tra...@continuum.io wrote: We should take this discussion off list. Just as a bystander interested in this: why? It seems that OCL is within the scope of what's being proposed and another entrant into the vibrant new world of compiler-extended machinery for fast numerical work in cpython, so I suspect I'm not the only numpy user curious to know the answer on-list. I know sometimes there are legitimate reasons to take a discussion off-list, but in this case it seemed to be a perfectly reasonable question that also made me curious (as I only learned of OCL thanks to this discussion). Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] A new webpage promoting Compiler technology for CPython
Hey all, With Numba and Blaze we have been doing a lot of work on what essentially is compiler technology and realizing more and more that we are treading on ground that has been plowed before with many other projects. So, we wanted to create a web-site and perhaps even a mailing list or forum where people could coordinate and communicate about compiler projects, compiler tools, and ways to share efforts and ideas. The website is: http://compilers.pydata.org/ This page is specifically for Compiler projects that either integrate with or work directly with the CPython run-time which is why PyPy is not presently listed. The PyPy project is a great project but we just felt that we wanted to explicitly create a collection of links to compilation projects that are accessible from CPython which are likely less well known. But that is just where we started from. The website is intended to be a community website constructed from a github repository. So, we welcome pull requests from anyone who would like to see the website updated to reflect their related project.Jon Riehl (Mython, PyFront, ROFL, and many other interesting projects) and Stephen Diehl (Blaze) and I will be moderating the pull requests to begin with. But, we welcome others with similar interests to participate in that effort of moderation. The github repository is here: https://github.com/pydata/compilers-webpage This is intended to be a community website for information spreading, and so we welcome any and all contributions. Thank you, Travis Oliphant ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0rc1 release
Fantastic job everyone! Hats of to you Ondrej! -Travis On Dec 28, 2012, at 6:02 PM, Ondřej Čertík wrote: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.7.0rc1. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.7.0rc1/ We have fixed all issues known to us since the 1.7.0b2 release. The only remaining issue is a documentation improvement: https://github.com/numpy/numpy/issues/561 Please test this release and report any issues on the numpy-discussion mailing list. If there are no more problems, we'll release the final version soon. I'll wait at least a week and please write me an email if you need more time for testing. I would like to thank Sebastian Berg, Ralf Gommers, Han Genuit, Nathaniel J. Smith, Jay Bourque, Gael Varoquaux, Mark Wiebe, Matthew Brett, Skipper Seabold, Peter Cock, Charles Harris, Frederic, Gabriel, Luis Pedro Coelho, Pauli Virtanen, Travis E. Oliphant and cgohlke for sending patches and fixes for this release since 1.7.0b2. Cheers, Ondrej P.S. Source code is uploaded to sourceforge, and I'll upload the rest of the Windows and Mac binaries in a few hours as they finish building. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Future of numpy (was: DARPA funding for Blaze and passing the NumPy torch)
On Dec 20, 2012, at 7:39 PM, Nathaniel Smith wrote: On Thu, Dec 20, 2012 at 11:46 PM, Matthew Brett matthew.br...@gmail.com wrote: Travis - I think you are suggesting that there should be no one person in charge of numpy, and I think this is very unlikely to work well. Perhaps there are good examples of well-led projects where there is not a clear leader, but I can't think of any myself at the moment. My worry would be that, without a clear leader, it will be unclear how decisions are made, and that will make it very hard to take strategic decisions. Curious; my feeling is the opposite, that among mature and successful FOSS projects, having a clear leader is the uncommon case. GCC doesn't, Glibc not only has no leader but they recently decided to get rid of their formal steering committee, I'm pretty sure git doesn't, Apache certainly doesn't, Samba doesn't really, etc. As usual Karl Fogel has sensible comments on this: http://producingoss.com/en/consensus-democracy.html In practice the main job of a successful FOSS leader is to refuse to make decisions, nudge people to work things out, and then if they refuse to work things out tell them to go away until they do: https://lwn.net/Articles/105375/ and what actually gives people influence in a project is the respect of the other members. The former stuff is stuff anyone can do, and the latter isn't something you can confer or take away with a vote. I will strongly voice my opinion that NumPy does not need an official single leader.What it needs are committed, experienced, service-oriented developers and users who are willing to express their concerns and requests because they are used to being treated well.It also needs new developers who are willing to dive into code, contribute to discussions, tackle issues, make pull requests, and review pull requests.As people do this regularly, the leaders of the project will emerge as they have done in the past. Even though I called out three people explicitly --- there are many more contributors to NumPy whose voices deserve attention. But, you don't need me to point out the obvious to what the Github record shows about who is shepherding NumPy these days.But, the Github record is not the only one that matters.I would love to see NumPy developers continue to pay attention to and deeply respect the users (especially of downstream projects that depend on NumPy). I plan to continue using NumPy myself and plan to continue to encourage others around me to contribute patches, fixes and features. Obviously, there are people who have rights to merge pull-requests to the repository.But, this group seems always open to new, willing help.From a practical matter, this group is the head development group of the official NumPy fork.I believe this group will continue to be open enough to new, motivated contributors which will allow it to grow to the degree that such developers are available. Nor do we necessarily have a great track record for executive decisions actually working things out. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] DARPA funding for Blaze and passing the NumPy torch
Hello all, There is a lot happening in my life right now and I am spread quite thin among the various projects that I take an interest in. In particular, I am thrilled to publicly announce on this list that Continuum Analytics has received DARPA funding (to the tune of at least $3 million) for Blaze, Numba, and Bokeh which we are writing to take NumPy, SciPy, and visualization into the domain of very large data sets.This is part of the XDATA program, and I will be taking an active role in it.You can read more about Blaze here: http://blaze.pydata.org. You can read more about XDATA here: http://www.darpa.mil/Our_Work/I2O/Programs/XDATA.aspx I personally think Blaze is the future of array-oriented computing in Python. I will be putting efforts and resources next year behind making that case. How it interacts with future incarnations of NumPy, Pandas, or other projects is an interesting and open question. I have no doubt the future will be a rich ecosystem of interoperating array-oriented data-structures. I invite anyone interested in Blaze to participate in the discussions and development at https://groups.google.com/a/continuum.io/forum/#!forum/blaze-dev or watch the project on our public GitHub repo: https://github.com/ContinuumIO/blaze. Blaze is being incubated under the ContinuumIO GitHub project for now, but eventually I hope it will receive its own GitHub project page later next year. Development of Blaze is early but we are moving rapidly with it (and have deliverable deadlines --- thus while we will welcome input and pull requests we won't have a ton of time to respond to simple queries until at least May or June).There is more that we are working on behind the scenes with respect to Blaze that will be coming out next year as well but isn't quite ready to show yet. As I look at the coming months and years, my time for direct involvement in NumPy development is therefore only going to get smaller. As a result it is not appropriate that I remain as head steward of the NumPy project (a term I prefer to BFD12 or anything else). I'm sure that it is apparent that while I've tried to help personally where I can this year on the NumPy project, my role has been more one of coordination, seeking funding, and providing expert advice on certain sections of code.I fundamentally agree with Fernando Perez that the responsibility of care-taking open source projects is one of stewardship --- something akin to public service.I have tried to emulate that belief this year --- even while not always succeeding. It is time for me to make official what is already becoming apparent to observers of this community, namely, that I am stepping down as someone who might be considered head steward for the NumPy project and officially leaving the development of the project in the hands of others in the community. I don't think the project actually needs a new head steward --- especially from a development perspective. Instead I see a lot of strong developers offering key opinions for the project as well as a great set of new developers offering pull requests. My strong suggestion is that development discussions of the project continue on this list with consensus among the active participants being the goal for development. I don't think 100% consensus is a rigid requirement --- but certainly a super-majority should be the goal, and serious changes should not be made with out a clear consensus. I would pay special attention to under-represented people (users with intense usage of NumPy but small voices on this list). There are many of them.If you push me for specifics then at this point in NumPy's history, I would say that if Chuck, Nathaniel, and Ralf agree on a course of action, it will likely be a good thing for the project. I suspect that even if only 2 of the 3 agree at one time it might still be a good thing (but I would expect more detail and discussion).There are others whose opinion should be sought as well: Ondrej Certik, Perry Greenfield, Robert Kern, David Cournapeau, Francesc Alted, and Mark Wiebe to name a few.For some questions, I might even seek input from people like Konrad Hinsen and Paul Dubois --- if they have time to give it. I will still be willing to offer my view from time to time and if I am asked. Greg Wilson (of Software Carpentry fame) asked me recently what letter I would have written to myself 5 years ago. What would I tell myself to do given the knowledge I have now? I've thought about that for a bit, and I have some answers. I don't know if these will help anyone, but I offer them as hopefully instructive: 1) Do not promise to not break the ABI of NumPy --- and in fact emphasize that it will be broken at least once in the 1.X series.NumPy was designed to add new data-types --- but not without breaking the ABI.NumPy has
[Numpy-discussion] www.numpy.org home page
For people interested in the www.numpy.org home page: Jon Turner has officially transferred the www.numpy.org domain to NumFOCUS. Thank you, Jon for this donation and for being a care-taker of the domain-name. We have setup the domain registration to point to numpy.github.com and I've changed the CNAME in that repostiory to www.numpy.org I've sent an email to have the numpy.scipy.org page to redirect to www.numpy.org. The NumPy home page can still be edited in this repository: g...@github.com:numpy/numpy.org.git. Pull requests are always welcome --- especially pull requests that improve the look and feel of the web-page. Two of the content changes that we need to make a decision about is 1) whether or not to put links to books published (Packt publishing for example has offered a higher percentage of their revenues if we put a prominent link on www.numpy.org) 2) whether or not to accept Sponsored by links on the home page for donations to the project (e.g. Continuum Analytics has sponsored Ondrej release management, other companies have sponsored pull requests, other companies may want to provide donations and we would want to recognize their contributions to the numpy project). These decisions should be made by the NumPy community which in my mind are interested people on this list. Who is interested in this kind of discussion? We could have these discussions on this list or on the numfo...@googlegroups.com list and keep this list completely technical (which I prefer, but I will do whatever the consensus is). Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8
A big +1 from me --- but I don't have anyone I know using 2.4 anymore -Travis On Dec 13, 2012, at 10:34 AM, Charles R Harris wrote: Time to raise this topic again. Opinions welcome. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement
Raul, This is *fantastic work*. While many optimizations were done 6 years ago as people started to convert their code, that kind of report has trailed off in the last few years. I have not seen this kind of speed-comparison for some time --- but I think it's definitely beneficial. NumPy still has quite a bit that can be optimized. I think your example is really great.Perhaps it's worth making a C-API macro out of the short-cut to the attribute string so it can be used by others.It would be interesting to see where your other slow-downs are. I would be interested to see if the slow-math of float64 is hurting you.It would be possible, for example, to do a simple subclass of the ndarray that overloads a[integer] to be the same as array.item(integer). The latter syntax returns python objects (i.e. floats) instead of array scalars. Also, it would not be too difficult to add fast-math paths for int64, float32, and float64 scalars (so they don't go through ufuncs but do scalar-math like the float and int objects in Python. A related thing we've been working on lately which might help you is Numba which might help speed up functions that have code like: a[0] 4 : http://numba.pydata.org. Numba will translate the expression a[0] 4 to a machine-code address-lookup and math operation which is *much* faster when a is a NumPy array.Presently this requires you to wrap your function call in a decorator: from numba import autojit @autojit def function_to_speed_up(...): pass In the near future (2-4 weeks), numba will grow the experimental ability to basically replace all your function calls with @autojit versions in a Python function.I would love to see something like this work: python -m numba filename.py To get an effective autojit on all the filename.py functions (and optionally on all python modules it imports).The autojit works out of the box today --- you can get Numba from PyPI (or inside of the completely free Anaconda CE) to try it out. Best, -Travis On Dec 2, 2012, at 7:28 PM, Raul Cota wrote: Hello, First a quick summary of my problem and at the end I include the basic changes I am suggesting to the source (they may benefit others) I am ages behind in times and I am still using Numeric in Python 2.2.3. The main reason why it has taken so long to upgrade is because NumPy kills performance on several of my tests. I am sorry if this topic has been discussed before. I tried parsing the mailing list and also google and all I found were comments related to the fact that such is life when you use NumPy for small arrays. In my case I have several thousands of lines of code where data structures rely heavily on Numeric arrays but it is unpredictable if the problem at hand will result in large or small arrays. Furthermore, once the vectorized operations complete, the values could be assigned into scalars and just do simple math or loops. I am fairly sure the core of my problems is that the 'float64' objects start propagating all over the program data structures (not in arrays) and they are considerably slower for just about everything when compared to the native python float. Conclusion, it is not practical for me to do a massive re-structuring of code to improve speed on simple things like a[0] 4 (assuming a is an array) which is about 10 times slower than b 4 (assuming b is a float) I finally decided to track down the problem and I started by getting Python 2.6 from source and profiling it in one of my cases. By far the biggest bottleneck came out to be PyString_FromFormatV which is a function to assemble a string for a Python error caused by a failure to find an attribute when multiarray calls PyObject_GetAttrString. This function seems to get called way too often from NumPy. The real bottleneck of trying to find the attribute when it does not exist is not that it fails to find it, but that it builds a string to set a Python error. In other words, something as simple as a[0] 3.5 internally result in a call to set a python error . I downloaded NumPy code (for Python 2.6) and tracked down all the calls like this, ret = PyObject_GetAttrString(obj, __array_priority__); and changed to if (PyList_CheckExact(obj) || (Py_None == obj) || PyTuple_CheckExact(obj) || PyFloat_CheckExact(obj) || PyInt_CheckExact(obj) || PyString_CheckExact(obj) || PyUnicode_CheckExact(obj)){ //Avoid expensive calls when I am sure the attribute //does not exist ret = NULL; } else{ ret = PyObject_GetAttrString(obj, __array_priority__); ( I think I found about 7 spots ) I also noticed (not as bad in my case) that calls to PyObject_GetBuffer also resulted in Python errors being set thus unnecessarily slower code. With this change, something like this,
Re: [Numpy-discussion] Z-ordering (Morton ordering) for numpy
This is pretty cool.Something like this would be interesting to play with. There are some algorithms that are faster with z-order arrays.The code is simple enough and small enough that I could see putting it in NumPy. What do others think? -Travis On Nov 24, 2012, at 1:03 PM, Gamblin, Todd wrote: Hi all, In the course of developing a network mapping tool I'm working on, I also developed some python code to do arbitrary-dimensional z-order (morton order) for ndarrays. The code is here: https://github.com/tgamblin/rubik/blob/master/rubik/zorder.py There is a function to put the elements of an array in Z order, and another one to enumerate an array's elements in Z order. There is also a ZEncoder class that can generate Z-codes for arbitrary dimensions and bit widths. I figure this is something that would be generally useful. Any interest in having this in numpy? If so, what should the interface look like and can you point me to a good spot in the code to add it? I was thinking it might make sense to have a Z-order iterator for ndarrays, kind of like ndarray.flat. i.e.: arr = np.empty([4,4], dtype=int) arr.flat = range(arr.size) for elt in arr.zorder: print elt, 0 4 1 5 8 12 9 13 2 6 3 7 10 14 11 15 Or an equivalent to ndindex: arr = np.empty(4,4, dtype=int) arr.flat = range(arr.size) for ix in np.zindex(arr.shape): print ix, (0, 0) (1, 0) (0, 1) (1, 1) (2, 0) (3, 0) (2, 1) (3, 1) (0, 2) (1, 2) (0, 3) (1, 3) (2, 2) (3, 2) (2, 3) (3, 3) Thoughts? -Todd __ Todd Gamblin, tgamb...@llnl.gov, http://people.llnl.gov/gamblin2 CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumFOCUS has received 501(c)3 status
Hello all, I'm really happy to report that NumFOCUS has received it's 501(c)3 status from the IRS. You can now make tax-deductible donations to NumFOCUS for the support of NumPy. We will put a NumPy-specific button on the home-page of NumPy soon so you can specifically direct your funds.But, for now you can go to http://numfocus.org/donate and be confident that your funds will support: 1) Continuous integration 2) The John Hunter Technical fellowships (which are awards made to students and post-docs and their mentors who will contribute substantially to a supported project during a 3-18 month period). 3) Equipment grants 4) Development sprints 5) Student travel to conferences 6) Project specific grants For example, most of Ondrej's time to work on the release of NumPy 1.7.0 has been paid for by donations to NumFOCUS from Continuum Analytics. NumFOCUS is also seeking nominations for 5 new board members (to bring the total to 9). If you would like to nominate someone please subscribe to numfo...@googlegroups.com (by sending an email to numfocus+subscr...@googlegroups.com) and then send your nomination. Alternatively, you can email me or one of the other directors directly. Best, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 1.7.0 release
Hey all, Ondrej has been tied up finishing his PhD for the past several weeks. He is defending his work shortly and should be available to continue to help with the 1.7.0 release around the first of December.He and I have been in contact during this process, and I've been helping where I can. Fortunately, other NumPy developers have been active closing tickets and reviewing pull requests which has helped the process substantially. The release has taken us longer than we expected, but I'm really glad that we've received the bug-reports and issues that we have seen because it will help the 1.7.0 release be a more stable series. Also, the merging of the Trac issues with Git has exposed over-looked problems as well and will hopefully encourage more Git-focused participation by users. We are targeting getting the final release of 1.7.0 out by mid December (based on Ondrej's availability). But, I would like to find out which issues are seen as blockers by people on this list. I think most of the issues that I had as blockers have been resolved.If there are no more remaining blockers, then we may be able to accelerate the final release of 1.7.0 to just after Thanksgiving. Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ticket 2228: Scientific package seeing ABI change in 1.6.x
On Nov 4, 2012, at 1:31 PM, Ralf Gommers wrote: On Wed, Oct 31, 2012 at 1:05 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 30, 2012 at 9:26 PM, Travis Oliphant tra...@continuum.io wrote: The NPY_CHAR is not a real type. There are no type-coercion functions attached to it nor ufuncs nor a full dtype object. However, it is used to mimic old Numeric character arrays (especially for copying a string). It should have been deprecated before changing the ABI. I don't think it was realized that it was part of the ABI (mostly for older codes that depended on Numeric). I think it was just another oversight that inserting type-codes changes this part of the ABI. The positive side is that It's a small part of the ABI and not many codes should depend on it. At this point, I'm not sure what can be done, except to document that NPY_CHAR has been deprecated in 1.7.0 and remove it in 1.8.0 to avoid future ABI difficulties. The short answer, is that codes that use NPY_CHAR must be recompiled to be compatible with 1.6.0. IIRC, it was proposed to remove it at one point, but the STScI folks wanted to keep it because their software depended on it. I can't find that discussion in the list archives. If you know who from STScI to ask about this, can you do so? Is replacing NPY_CHAR with NPY_STRING supposed to just work? No, it's a little more complicated than that, but not too much. Code that uses the NPY_CHAR type can be changed fairly easily to use the NPY_STRING type, but it does take some re-writing. The NPY_CHAR field was added so that code written for Numeric (like ScientificPython's netcdf reader) would continue to just work with no changes and behave similarly to how it behaved with Numeric's character type. Unfortunately, adding it to the end of the TYPE list does prevent adding any more types without breaking at least this part of the ABI. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ticket 2228: Scientific package seeing ABI change in 1.6.x
The NPY_CHAR is not a real type. There are no type-coercion functions attached to it nor ufuncs nor a full dtype object. However, it is used to mimic old Numeric character arrays (especially for copying a string). It should have been deprecated before changing the ABI. I don't think it was realized that it was part of the ABI (mostly for older codes that depended on Numeric). I think it was just another oversight that inserting type-codes changes this part of the ABI. The positive side is that It's a small part of the ABI and not many codes should depend on it. At this point, I'm not sure what can be done, except to document that NPY_CHAR has been deprecated in 1.7.0 and remove it in 1.8.0 to avoid future ABI difficulties. The short answer, is that codes that use NPY_CHAR must be recompiled to be compatible with 1.6.0. -Travis On Oct 30, 2012, at 8:46 PM, Charles R Harris wrote: On Tue, Oct 30, 2012 at 4:08 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: Hi, Ticket 2228 says ABI was broken in 1.6.x. Specifically, NPY_CHAR in the NPY_TYPES enum seems to be have been moved. Can anyone comment on why the 3 datetime related values were inserted instead of appended? I don't know, although having NPY_CHAR after NPY_NTYPES goes back to 1.0.3 NPY_NTYPES, NPY_NOTYPE, NPY_CHAR, /* special flag */ And I expect it was desired to keep it there on the expectation that there was a reason for it. The decision not to append was in 1.4.0 NPY_DATETIME, NPY_TIMEDELTA, NPY_NTYPES, NPY_NOTYPE, NPY_CHAR, /* special flag */ And probably due to Robert Kern or Travis, IIRC who worked on getting it in. I don't see a good way to get around the ABI break, I think the question going forward needs to be whether we leave it after NPY_NTYPES or make it part of the unchanging ABI, and I suspect we need to know what the 'special flag' comment means before we can make that decision. My suspicion is that it wasn't considered a real numeric type, but rather a flag marking a special string type, in which case it probably doesn't really belong among the types, which I think is also indicated by NPY_NOTYPE. Moving NPY_CHAR could have implications we would want to check, but I'd generally favor moving it all else being equal. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue tracking
On Oct 23, 2012, at 9:58 AM, David Cournapeau wrote: On Tue, Oct 23, 2012 at 5:05 AM, Thouis (Ray) Jones tho...@gmail.com wrote: On Fri, Oct 19, 2012 at 9:34 PM, Thouis (Ray) Jones tho...@gmail.com wrote: On Fri, Oct 19, 2012 at 4:46 PM, Thouis (Ray) Jones tho...@gmail.com wrote: On Fri, Oct 19, 2012 at 11:20 AM, Thouis (Ray) Jones tho...@gmail.com wrote: I started the import with the oldest 75 and newest 125 Trac issues, and will wait a few hours to do the rest to allow feedback, just in case something is broken that I haven't noticed. I did make one change to better emulate Trac behavior. Some Trac usernames are also email addresses, which Trac anonymizes in its display. I decided it was safer to do the same. The import is running again, though I've been having some failures in a few comments and general hangs (these might be network related). I'm keeping track of which issues might have had difficulties. @endolith noticed that I didn't correctly relink #XXX trac id numbers to github id numbers (both trac and github create links automatically), so that will have to be handled by a postprocessing script (which it probably would have, anyway, since the github # isn't known before import). Import has finished. The following trac #s had issues in creating the comments (I think due to network problems): 182, 297, 619, 621, 902, 904, 909 913, 914, 915, 1044, 1526. I'll review them and see if I can pull in anything missing I'll also work on a script for updating the trac crossrefs to github crossrefs. In the no good deed goes unpunished category, I accidentally logged in as myself (rather than numpy-gitbot) and pushed about 500 issues, so now I receive updates whenever one of them gets changed. At least most of them were closed, already... I just updated the cross-issue-references to use github rather than Trac id numbers. Stupidly, I may have accidentally removed comments that were added in the last few days to issues moved from trac to github. Hopefully not, or at least not many. It's probably a good idea to turn off Trac, soon, to keep too many new bugs from needing to be ported, and old bugs being commented on. The latter is more of a pain to deal with. I will look into making the NumPy trac read-only. It should not be too complicated to extend Pauli's code to redirect the tickets part to github issues. Have we decided what to do with the wiki content ? I believe there is a wiki dump command in trac wiki. We should put that content linked off the numpy pages at github. Thanks for helping with this. -Travis David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue tracking
Kudos! Ray. Very impressive and useful work. -Travis On Oct 19, 2012, at 10:20 AM, Thouis (Ray) Jones wrote: I started the import with the oldest 75 and newest 125 Trac issues, and will wait a few hours to do the rest to allow feedback, just in case something is broken that I haven't noticed. I did make one change to better emulate Trac behavior. Some Trac usernames are also email addresses, which Trac anonymizes in its display. I decided it was safer to do the same. Ray Jones ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Announcing Anaconda version 1.1
I just wanted to let everyone know about our new release of Anaconda which now has Spyder and Matplotlib working for Mac OS X and Windows. Right now, it's the best way to get the pre-requisites for Numba --- though I recommend getting the latest Numba from github as Numba is still under active development. * Anaconda 1.1 Announcement Continuum Analytics, Inc. is pleased to announce the release of Anaconda Pro 1.1, which extends Anaconda’s programming capabilities to the desktop. Anaconda Pro now includes an IDE (Spyder http://spyder-ide.blogspot.com/) and plotting capabilities (Matplotlib http://matplotlib.org/), as well as optimized versions of Numba Pro https://store.continuum.io/cshop/numbaproand IOPro https://store.continuum.io/cshop/iopro. With these enhancements, AnacondaPro is a complete solution for server-side computation or client-side development. It is equally well-suited for supercomputers or for training in a classroom. Available for Windows, Mac OS X, and Linux, Anaconda is a Python distribution for scientific computing, engineering simulation, and business intelligence data management. It includes the most popular numerical and scientific libraries used by scientists, engineers, and data analysts, with a single integrated and flexible installer. Continuum Analytics offers Enterprise-level support for Anaconda, covering both its open source libraries as well as the included commercial libraries from Continuum. For more information, to download a trial version of Anaconda Pro, or download the completely free Anaconda CE, click herehttps://store.continuum.io/cshop/anaconda . * * * * * Best regards, -Travis * * ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] A change with minor compatibility questions
Hey all, https://github.com/numpy/numpy/pull/482 is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array. Currently their hash function just hashes the pointer to the underlying data. This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently. The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash).This pull request also only allows read-only scalars to be hashed. There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think? Should we raise a warning in the next release when a hash function on a void scalar is called or just make the change, put it in the release notes and make a few people change their code if needed. The problem was identified by a couple of users of NumPy currently which is why I think that people who have tried using numpy void scalars as keys aren't doing it right now but are instead converting them to tuples first. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A change with minor compatibility questions
On Oct 17, 2012, at 12:48 PM, Dag Sverre Seljebotn wrote: On 10/17/2012 06:56 PM, Dag Sverre Seljebotn wrote: On 10/17/2012 05:22 PM, Travis Oliphant wrote: Hey all, https://github.com/numpy/numpy/pull/482 is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array. Currently their hash function just hashes the pointer to the underlying data.This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently. The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash). This pull request also only allows read-only scalars to be hashed. There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think? I support working on fixing this, but if I understand your fix correctly this change just breaks things in a different way. Specifically, in this example: arr = np.ones(4, dtype=[('a', np.int64)]) x = arr[0] d = { x : 'value' } arr[0]['a'] = 4 print d[x] Does the last line raise a KeyError? If I understand correctly it does. Argh. I overlooked both Travis' second commit, and the explicit mention of read-only above. Isn't it possible to produce a read-only array from a writeable one though, and so get a read-only scalar whose underlying value can still change? Yes, it is possible to do that (just like it is currently possible to change a tuple with a C-extension or even Cython or a string with NumPy). We won't be able to prevent people from writing code that will have odd behavior, but we can communicate correctly about what one should do. -Travis Anyway, sorry about being so quick to post. Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of .base
On Oct 1, 2012, at 9:11 AM, Jim Bosch wrote: On 09/30/2012 03:59 PM, Travis Oliphant wrote: Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list.Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? In the past, I've relied on putting arbitrary Python objects in .base in my C++ to NumPy conversion code to make sure reference counting for array memory works properly. In particular, I've used Python CObjects that hold boost::shared_ptrs, which don't even have a buffer interface. So it sounds like I may be a few steps behind on the rules of what actually should go in .base. This should still work, nothing has been proposed to change this use-case. I'm very concerned that if we do demand that .base always point to a NumPy array (rather than an arbitrary Python object or even just one with a buffer interface), there's no longer any way for a NumPy array to hold data allocated by something other than NumPy. I don't recall a suggestion to demand that .base always point to a NumPy array. The suggestion is that a view of a view of an array that has your boost::shared_ptr as a PyCObject pointed to by base will have it's base point to the first array instead of the PyCObject (as the recent change made). If I want to put external memory in a NumPy array and indicate that it's owned by some non-NumPy Python object, what is the recommended way to do that? The approach you took is still the way I would recommend doing that. There may be other suggestions. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Behavior of .base
Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list.Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of .base
We are not talking about changing it back. The change in 1.6 caused problems that need to be addressed. Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:30 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list.Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again. Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of .base
I think you are misunderstanding the proposal. The proposal is to traverse the views as far as you can but stop just short of having base point to an object of a different type. This fixes the infinite chain of views problem but also fixes the problem sklearn was having with base pointing to an unexpected mmap object. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:50 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant tra...@continuum.io wrote: We are not talking about changing it back. The change in 1.6 caused problems that need to be addressed. Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:30 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list.Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again. Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of .base
-- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 4:00 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 10:55 PM, Travis Oliphant tra...@continuum.io wrote: I think you are misunderstanding the proposal. The proposal is to traverse the views as far as you can but stop just short of having base point to an object of a different type. This fixes the infinite chain of views problem but also fixes the problem sklearn was having with base pointing to an unexpected mmap object. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:50 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 10:35 PM, Travis Oliphant tra...@continuum.io wrote: We are not talking about changing it back. The change in 1.6 caused problems that need to be addressed. Can you clarify your concerns? The proposal is not a major change to the behavior on master, but it does fix a real issue. -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 30, 2012, at 3:30 PM, Han Genuit hangen...@gmail.com wrote: On Sun, Sep 30, 2012 at 9:59 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list. Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I think the current behaviour of the .base attribute is much more stable and predictable than past behaviour. For views for instance, this makes sure you don't hold references of 'intermediate' views, but always point to the original *base* object. Also, I think a lot of internal logic depends on this behaviour, so I am not in favour of changing this back (yet) again. Also, considering that this behaviour already exists in past versions of NumPy, namely 1.6, and is very fundamental to how arrays work, I find it strange that it is now up for change in 1.7 at the last minute. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Well, the current behaviour makes sure you can have an endless chain of views derived from each other without keeping a copy of each view alive. If I understand correctly, you propose to change this behaviour to where it would keep a copy of each view alive.. My concern is that the problems that occurred from the 1.6 change are now seen as paramount above a correct implementation. There are problems with backward compatibility, but most of these are due to lack of documentation and testing. And now there will be a lot of people depending on the new behaviour, which is also something to take into account. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Ah, sorry, I get it. You mean to make sure that base is an object of type ndarray. No problems there. :-) Yes. Exactly. I realize I didn't explain it very well. For a subtype it would ensure base is a subtype. Thanks for feedback. Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of .base
It sounds like there are no objections and this has a strong chance to fix the problems.We will put it on the TODO list for 1.7.0 release. -Travis On Sep 30, 2012, at 9:30 PM, Charles R Harris wrote: On Sun, Sep 30, 2012 at 1:59 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, In a github-discussion with Gael and Nathaniel, we came up with a proposal for .base that we should put before this list.Traditionally, .base has always pointed to None for arrays that owned their own memory and to the most immediate array object parent for arrays that did not own their own memory. There was a long-standing issue related to running out of stack space that this behavior created. Recently this behavior was altered so that .base always points to the original object holding the memory (something exposing the buffer interface). This created some problems for users who relied on the fact that most of the time .base pointed to an instance of an array object. The proposal here is to change the behavior of .base for arrays that don't own their own memory so that the .base attribute of an array points to the most original object that is still an instance of the type of the array. This would go into the 1.7.0 release so as to correct the issues reported. What are reactions to this proposal? It sounds like this would solve the problem in the short term, but it is a bit of a hack in that the behaviour is more complicated than either the original or the current version. So I could see this in 1.7, but it might be preferable in the long term to work out what attributes are needed to solve Gael's problem more directly. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Making numpy sensible: backward compatibility please
Thank you for expressing this voice, Gael.It is an important perspective. The main reason that 1.7 has taken so long to get released is because I'm concerned about these kinds of changes and really want to either remove them or put in adequate warnings prior to moving forward. It's a long and complex process. Thanks for providing feedback when you encounter problems so that we can do our best to address them. I agree that we should be much more cautious about semantic changes in the 1.X series of NumPy.How we handle situations where 1.6 changed things from 1.5 and wasn't reported until now is an open question and depends on the particular problem in question.I agree that we should be much more cautious about changes (particularly semantic changes that will break existing code). -Travis On Sep 28, 2012, at 4:23 PM, Gael Varoquaux wrote: Hi numpy developers, First of all, thanks a lot for the hard work you put in numpy. I know very well that maintaining such a core library is a lot of effort and a service to the community. But with great dedication, comes great responsibility :). I find that Numpy is a bit of a wild horse, a moving target. I have just fixed a fairly nasty bug in scikit-learn [1] that was introduced by change of semantics in ordering when doing copies with numpy. I have been running working and developing the scikit-learn while tracking numpy's development tree and, as far as I can tell, I never saw warnings raised in our code that something was going to change, or had changed. In other settings, changes in array inheritance and 'base' propagation have made impossible some of our memmap-related usecase that used to work under previous numpy [2]. Other's have been hitting difficulties related to these changes in behavior [3]. Not to mention the new casting rules (default: 'same_kind') that break a lot of code, or the ABI change that, while not done an purpose, ended up causing us a lot of pain. My point here is that having code that works and gives correct results with new releases of numpy is more challenging that it should be. I cannot claim that I disagree with the changes that I mention above. They were all implemented for a good reason and can all be considered as overall improvements to numpy. However the situation is that given a complex codebase relying on numpy that works at a time t, the chances that it works flawlessly at time t + 1y are thin. I am not too proud that we managed to release scikit-learn 0.12 with a very ugly bug under numpy 1.7. That happened although we have 90% of test coverage, buildbots under different numpy versions, and a lot of people, including me, using our development tree on a day to day basis with bleeding edge numpy. Most code in research settings or RD industry does not benefit from such software engineering and I believe is much more likely to suffer from changes in numpy. I think that this is a cultural issue: priority is not given to stability and backward compatibility. I think that this culture is very much ingrained in the Python world, that likes iteratively cleaning its software design. For instance, I have the feeling that in the scikit-learn, we probably fall in the same trap. That said, such a behavior cannot fare well for a base scientific environment. People tell me that if they take old matlab code, the odds that it will still works is much higher than with Python code. As a geek, I tend to reply that we get a lot out of this mobility, because we accumulate less cruft. However, in research settings, for reproducibility reasons, ones need to be able to pick up an old codebase and trust its results without knowing its intricacies. From a practical standpoint, I believe that people implementing large changes to the numpy codebase, or any other core scipy package, should think really hard about their impact. I do realise that the changes are discussed on the mailing lists, but there is a lot of activity to follow and I don't believe that it is possible for many of us to monitor the discussions. Also, putting more emphasis on backward compatibility is possible. For instance, the 'order' parameter added to np.copy could have defaulted to the old behavior, 'K', for a year, with a DeprecationWarning, same thing for the casting rules. Thank you for reading this long email. I don't mean it to be a complaint about the past, but more a suggestion on something to keep in mind when making changes to core projects. Cheers, Gaël [1] https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783 [2] http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html [3] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] Making numpy sensible: backward compatibility please
On Sep 28, 2012, at 4:53 PM, Henry Gomersall wrote: On Fri, 2012-09-28 at 16:43 -0500, Travis Oliphant wrote: I agree that we should be much more cautious about semantic changes in the 1.X series of NumPy.How we handle situations where 1.6 changed things from 1.5 and wasn't reported until now is an open question and depends on the particular problem in question.I agree that we should be much more cautious about changes (particularly semantic changes that will break existing code). One thing I noticed in my (short and shallow) foray into numpy development was the rather limited scope of the tests in the area I touched (fft). I know not the extent to which this is true across the code base, but I know from experience the value of a truly exhaustive test set (every line tested for every condition). Perhaps someone with a deeper knowledge could comment on this? Thank you for bringing this up. It is definitely a huge flaw of NumPy that it does not have more extensive testing. It is a result of the limited resources under which NumPy has been developed.We are trying to correct this problem over time --- but it takes time.In the mean time, there is a huge install base of code out there which acts as a de-facto test suite of NumPy. We just need to make sure those tests actually get run on new versions of NumPy and we get reports back of failures --- especially when subtle changes have taken place in the way things work (iteration in ufuncs and coercion rules being the most obvious). This results in longer release cycles if releases contain code that significantly change the way things work (removed APIs, altered coercion rules, etc.) The alteration of the semantics of how the base attribute works is a good example. Everyone felt it was a good idea to have the .base attribute point to the actual object holding the memory (and it fixed a well-known example of how you could crash Python by building up a stack of array-object references). However, our fix created a problem for code that uses memmap objects and relied on the fact that the .base attribute would hold a reference to the most recent *memmap* object. This was an unforeseen problem with our change. On the other hand, change is a good thing and we don't want NumPy to stop getting improvements. We just have to be careful that we don't allow our enthusiasm for new features and changes to over-ride our responsibility to end-users. I appreciate the efforts of all the NumPy developers in working through the inevitable debates that differences in perspective on that fundamental trade-off will bring. Best, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Making numpy sensible: backward compatibility please
From a practical standpoint, I believe that people implementing large changes to the numpy codebase, or any other core scipy package, should think really hard about their impact. I do realise that the changes are discussed on the mailing lists, but there is a lot of activity to follow and I don't believe that it is possible for many of us to monitor the discussions. Also, putting more emphasis on backward compatibility is possible. For instance, the 'order' parameter added to np.copy could have defaulted to the old behavior, 'K', for a year, with a DeprecationWarning, same thing for the casting rules. Maybe it still can, but you have to tell us details :-) In general numpy development just needs more people keeping track of these things. If you want to keep an open source stack functional sometimes you have to pay a tax of your time to making sure the levels below you will continue to suit your needs. Thanks for the thorough and thoughtful response. Well spoken... -Travis -n Thank you for reading this long email. I don't mean it to be a complaint about the past, but more a suggestion on something to keep in mind when making changes to core projects. Cheers, Gaël [1] https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783 [2] http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html [3] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.array execution path
Check to see if this expression is true no is o In the first case no and o are the same object Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Sep 22, 2012, at 1:01 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hi, I have a bit of trouble figuring this out. I would have expected np.asarray(array) to go through ctors, PyArray_NewFromArray, but it seems to me it does not, so which execution path is exactly taken here? The reason I am asking is that I want to figure out this behavior/bug, and I really am not sure which function is responsible: In [69]: o = np.ones(3) In [70]: no = np.asarray(o, order='C') In [71]: no[:] = 10 In [72]: o # OK, o was changed in place: Out[72]: array([ 10., 10., 10.]) In [73]: no.flags # But no claims to own its data! Out[73]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [74]: no = np.asarray(o, order='F') In [75]: no[:] = 11 In [76]: o # Here asarray actually returned a real copy! Out[76]: array([ 10., 10., 10.]) Thanks, Sebastian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] specifying numpy as dependency in your project, install_requires
On Sep 21, 2012, at 3:13 PM, Ralf Gommers wrote: Hi, An issue I keep running into is that packages use: install_requires = [numpy] or install_requires = ['numpy = 1.6'] in their setup.py. This simply doesn't work a lot of the time. I actually filed a bug against patsy for that (https://github.com/pydata/patsy/issues/5), but Nathaniel is right that it would be better to bring it up on this list. The problem is that if you use pip, it doesn't detect numpy (may work better if you had installed numpy with setuptools) and tries to automatically install or upgrade numpy. That won't work if users don't have the right compiler. Just as bad would be that it does work, and the user didn't want to upgrade for whatever reason. This isn't just my problem; at Wes' pandas tutorial at EuroScipy I saw other people have the exact same problem. My recommendation would be to not use install_requires for numpy, but simply do something like this in setup.py: try: import numpy except ImportError: raise ImportError(my_package requires numpy) or try: from numpy.version import short_version as npversion except ImportError: raise ImportError(my_package requires numpy) if npversion '1.6': raise ImportError(Numpy version is %s; required is version = 1.6 % npversion) Any objections, better ideas? Is there a good place to put it in the numpy docs somewhere? I agree. I would recommend against using install requires. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] tests for casting table? (was: Numpy 1.7b1 API change cause big trouble)
Here are a couple of scripts that might help (I used them to compare casting tables between various versions of NumPy): Casting Table Creation Script import numpy as np operators = np.set_numeric_ops().values() types = '?bhilqpBHILQPfdgFDGO' to_check = ['add', 'divide', 'minimum', 'maximum', 'remainder', 'true_divide', 'logical_or', 'bitwise_or', 'right_shift', 'less', 'equal'] operators = [op for op in operators if op.__name__ in to_check] def type_wrap(op): def func(obj1, obj2): try: result = op(obj1, obj2) char = result.dtype.char except: char = 'X' return char return func def coerce(): result = {} for op in operators: d = {} name = op.__name__ print name op = type_wrap(op) for type1 in types: s1 = np.dtype(type1).type(2) a1 = np.dtype(type1).type([1,2,3]) for type2 in types: s2 = np.dtype(type2).type(1) a2 = np.dtype(type2).type([2,3,4]) codes = [] # scalar op scalar codes.append(op(s1, s2)) # scalar op array codes.append(op(s1, a2)) # array op scalar codes.append(op(a1, s2)) # array op array codes.append(op(a1, a2)) d[type1,type2] = codes result[name] = d #for check_key in to_check: # for key in result.keys(): #if key == check_key: #continue #if result[key] == result[check_key]: #del result[key] #assert set(result.keys()) == set(to_check) return result import sys if sys.maxint 2**33: bits = 64 else: bits = 32 def write(): import cPickle file = open('coercion-%s-%sbit.pkl'%(np.__version__, bits),'w') cPickle.dump(coerce(),file,protocol=2) file.close() if __name__ == '__main__': write() Comparison Script import numpy as np def compare(result1, result2): for op in result1.keys(): print , op, if op not in result2: print op, not in the first table1 = result1[op] table2 = result2[op] if table1 == table2: print Tables are the same else: if set(table1.keys()) != set(table2.keys()): print Keys are not the same continue for key in table1.keys(): if table1[key] != table2[key]: print Different at , key, : , table1[key], table2[key] import cPickle import sys if __name__ == '__main__': name1 = 'coercion-1.5.1-64bit.pkl' name2 = 'coercion-1.6.1-64bit.pkl' if len(sys.argv) 1: name1 = 'coercion-%s-64bit.pkl' % sys.argv[1] if len(sys.argv) 2: name2 = 'coercion-%s-64bit.pkl' % sys.argv[2] result1 = cPickle.load(open(name1)) result2 = cPickle.load(open(name2)) compare(result1, result2) On Sep 20, 2012, at 3:09 PM, Nathaniel Smith wrote: On Mon, Sep 17, 2012 at 10:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Sep 9, 2012 at 6:12 PM, Frédéric Bastien no...@nouiz.org wrote: The third is releated to change to the casting rules in numpy. Before a scalar complex128 * vector float32 gived a vector of dtype complex128. Now it give a vector of complex64. The reason is that now the scalar of different category only change the category, not the precision. I would consider a must that we warn clearly about this interface change. Most people won't see it, but people that optimize there code heavily could depend on such thing. It seems to me that it would be a very good idea to put the casting table results into the tests to make sure we are keeping track of this kind of thing. I'm happy to try to do it if no-one else more qualified has time. I haven't seen any PRs show up from anyone else in the last few days, and this would indeed be an excellent test to have, so that would be awesome. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Sep 18, 2012, at 1:47 PM, Charles R Harris wrote: On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.io wrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer. It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix. Shouldn't we be issuing a warning, though? Even if the desire is to change the casting rules? The fact that multiple codes are breaking and need to be upgraded seems like a hard thing to require of someone going straight from 1.6 to 1.7. That's what I'm opposed to. All of these efforts move NumPy to its use as a library instead of an interactive environment where it started which is a good direction to move, but managing this move in the context of a very large user-community is the challenge we have. -Travis Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote: On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.io wrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer. It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix. Chuck Mind you, in my case, casting the rhs as an integer before doing the multiplication would be a bug, since our value for the rhs is usually between zero and one. Multiplying first by the integer numerator before dividing by the integer denominator would likely cause issues with overflowing the 16 bit integer. For the case in point I'd do In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) In [2]: a //= 2 In [3]: a Out[3]: array([0, 1, 1, 2, 2], dtype=int16) Although I expect you would want something different in practice. But the current code already looks fragile to me and I think it is a good thing you are taking a closer look at it. If you really intend going through a float, then it should be something like a = (a*(float(128)/256)).astype(int16) Chuck And thereby losing the memory benefit of an in-place multiplication? What makes you think you are getting that? I'd have to check the numpy C source, but I expect the multiplication is handled just as I wrote it out. I don't recall any
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
That is sort of the point of all this. We are using 16 bit integers because we wanted to be as efficient as possible and didn't need anything larger. Note, that is what we changed the code to, I am just wondering if we are being too cautious. The casting kwarg looks to be what I might want, though it isn't as clean as just writing an *= statement. I think even there you will have an intermediate float array followed by a cast. This is true, but it is done in chunks of a fixed size (controllable by a thread-local variable or keyword argument to the ufunc). How difficult would it be to change in-place operations back to the unsafe default? Probably not too difficult, but I think it would be a mistake. What keyword argument are you referring to? In the current case, I think what is wanted is a scaling function that will actually do things in place. The matplotlib folks would probably be happier with the result if they simply coded up a couple of small Cython routines to do that. http://docs.scipy.org/doc/numpy/reference/ufuncs.html#ufunc In particular, the extobj keyword argument or the thread-local variable at umath.UFUNC_PYVALS_NAME But, the problem is not just for matplotlib. Matplotlib is showing a symptom of the problem of just changing the default casting mode in one release.I think this is too stark of a change for a single minor release without some kind of glide path or warning system. I think we need to change in-place multiplication back to unsafe and then put in the release notes that we are planning on changing this for 1.8. It would be ideal if we could raise a warning when unsafe castings occur. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in behavior of np.concatenate for upcoming release
I was working on the same fix and so I saw your code was similar and merged it. It needs to be back-ported to 1.7.0 Thanks, -Travis On Sep 15, 2012, at 11:06 AM, Han Genuit wrote: Okay, sent in a pull request: https://github.com/numpy/numpy/pull/443. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in behavior of np.concatenate for upcoming release
It's very nice to get your help.I hope I haven't inappropriately set expectations :-) -Travis On Sep 15, 2012, at 3:14 PM, Han Genuit wrote: Yeah, that merge was fast. :-) Regards, Han ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Obscure code in concatenate code path?
On Sep 13, 2012, at 8:40 AM, Nathaniel Smith wrote: On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, While writing some tests for np.concatenate, I ran foul of this code: if (axis = NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); } in multiarraymodule.c How deeply weird This is expected behavior. It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation.This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on. Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time. Reference PyArray_AxisConverter (which turns a None Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Obscure code in concatenate code path?
This is expected behavior. It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on. Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time. Reference PyArray_AxisConverter (which turns a None Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API. How about something like: #define NPY_NONE_AXIS NPY_MAXDIMS to make it clearer what is intended? +1 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion