Re: [Numpy-discussion] Insights / lessons learned from NumPy design
Just wanted to say a big thanks to everyone in the NumPy community who has commented on this topic - it's given us a lot to think about and a lot of good ideas to work into the design! Best regards, Mike. On 4 January 2013 14:29, Mike Anderson mike.r.anderson...@gmail.com wrote: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Mike. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 10 January 2013 05:19, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson I'm hoping the API will be independent of storage format - i.e. the underlying implementations can store the data any way they like. So the API will be written in terms of abstractions, and the user will have the choice of whatever concrete implementation best fits the specific needs. Sparse matrices, tiled matrices etc. should all be possible options. A note about that -- as I think if it, numpy arrays are two things: 1) a python object for working with numbers, in a wide variety of ways 2) a wrapper around a C-array (or data block) that can be used to provide an easyway for Python to interact with C (and Fortran, and...) libraries, etc. As it turns out a LOT of people use numpy for (2) -- what this means is that while you could change the underlying data representation, etc, and keep the same Python API -- such changes would break a lot of non-pure-python code that relies on that data representation. This is a big issue with the numpy-for-PyPy project -- they could write a numpy clone, but it would only be useful for the pure-python stuff. Even then, a number of folks do tricks with numpy arrays in python that rely on the underlying structure. Not sure how all this would play out for Clojure, but it's something to keep in mind. Thanks Chris - this is a really helpful insight. Trying to translate that into the Clojure world, I think that's roughly equivalent to the separation between the API (roughly equivalent to the methods in the ndarray referred to in 1) from the specific implementations (which will probably include a data block ndarray-style wrapper like 2, but would also leave open other implementation options). That way the majority of users can code purely against the API, and they won't be affected if (when?) the underlying implementation changes. In this way, they should be able to get the benefits of 2) without building a direct dependency on it. Of course, I still expect some users to circumvent the API and build a dependency on the underlying implementation. Nothing we can do to stop that, and they may even have good reasons like hardcore performance optimization. We have to assume at that point they know what they are doing and are prepared to live with the consequences :-) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 8 January 2013 02:08, Chris Barker - NOAA Federal chris.bar...@noaa.govwrote: On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson mike.r.anderson...@gmail.com wrote: In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. A few thoughts: We're thinking of a matrix library First -- is this a matrix library, or a general use nd-array library? That will drive your design a great deal. For my part, I came from MATLAB, which started our very focused on matrixes, then extended to be more generally useful. Personally, I found the matrix-focus to get in the way more than help -- in any real code, you're the actual matrix operations are likely to be a tiny fraction of the code. One reason I like numpy is that it is array-first, with secondary support for matrix stuff. That being said, there is the numpy matrix type, and there are those that find it very useful. particularly in teaching situations, though it feels a bit tacked-on, and that does get in the way, so if you want a real matrix object, but also a general purpose array lib, thinking about both up front will be helpful. This is very useful context - thanks! I've had opinions in favour of both an nd-array style library and a matrix library. I guess it depends on your use case which one you are more inclined to think in. I'm hoping that it should be possible for the same API to support both, i.e. you should be able to use a 2D array of numbers as a matrix, and vice-versa. - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) what is a multi-dimensional matrix? -- is a 3-d something, a stack of matrixes? or something else? (note, numpy lacks this kind of object, but it is sometimes asked for -- i.e a way to do fast matrix multiplication with a lot of small matrixes) I think fast paths for 1-D and 2-D is secondary, though you may want easy paths for those. IN particular, if you want good support for linear algebra (matrixes), then having a clean and natural row vector and column vector would be nice. See the archives of this list for a bunch of discussion about that -- and what the weaknesses are of the numpy matrix object. - Immutability by default, i.e. matrix operations are pure functions that create new matrices. I'd be careful about this -- the purity and predictability is nice, but these days a lot of time is spend allocating and moving memory around -- numpy array's mutability is a major key feature -- indeed, the key issues with performance with numpy surrond the fact that many copies may be made unnecessarily (note, Dag's suggesting of lazy evaluation may mitigate this to some extent). Interesting and very useful to know. Sounds like we should definitely allow for mutable arrays / zero-copy operations in that case if that is proving to be a big bottleneck. - Support for 64-bit double precision floats only (this is the standard float type in Clojure) not a bad start, but another major strength of numpy is the multiple data types - you may wantt to design that concept in from the start. Sounds like good advice and that should be possible to accomodate in the design. But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) This ties in to another major strength of numpy -- ndarrays are both powerful python objects, and wrappers around standard C arrays -- that makes it pretty darn easy to interface with external libraries for core computation. Great - good to know we are on the right track with this one. Thanks Chris for all your comments / suggestions! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 4 January 2013 16:00, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote: On 01/04/2013 07:29 AM, Mike Anderson wrote: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) Food for thought: Myself I have vectors that are naturally stored in 2D, matrices that can be naturally stored in 4D and so on (you can't view them that way when doing linear algebra, it's just that the indices can have multiple components) -- I like that NumPy calls everything array; I think vector and matrix are higher-level mathematical concepts. Very interesting. Can I ask what the application is? And is it equivalent from a mathematical perspective to flattening the 2D vectors into very long 1D vectors? - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure Sounds very promising (assuming you can reuse the buffer if the input matrix had no other references and is not used again?). It's very common for NumPy arrays to fill a large chunk of the available memory (think 20-100 GB), so for those users this would need to be coupled with buffer reuse and good diagnostics that help remove references to old generations of a matrix. Yes it should be possible to re-use buffers, though to some extent that would depend on the underlying matrix library implementation. The JVM makes things a bit interesting here - the GC is extremely good but it doesn't play particularly nicely with non-Java native code. 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy - I'd be happy with good handling of 100MB matrices right now. - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Travis Oliphant noted some of his thoughts on this in the recent thread DARPA funding for Blaze and passing the NumPy torch which is a must-read. Great link. Thanks for this and all your other comments! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 4 January 2013 16:13, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote: On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote: On 01/04/2013 07:29 AM, Mike Anderson wrote: snip Oh: Depending on your amibitions, it's worth thinking hard about i) storage format, and ii) lazy evaluation. Storage format: The new trend is for more flexible formats than just column-major/row-major, e.g., storing cache-sized n-dimensional tiles. I'm hoping the API will be independent of storage format - i.e. the underlying implementations can store the data any way they like. So the API will be written in terms of abstractions, and the user will have the choice of whatever concrete implementation best fits the specific needs. Sparse matrices, tiled matrices etc. should all be possible options. Has this kind of approach been used much with NumPy? Lazy evaluation: The big problem with numpy is that a + b + np.sqrt(c) will first make a temporary result for a + b, rather than doing the whole expression on the fly, which is *very* bad for performance. So if you want immutability, I urge you to consider every operation to build up an expression tree/program, and then either find out the smart points where you interpret that program automatically, or make explicit eval() of an expression tree the default mode. Very interesting. Seems like this could be layered on top though? i.e. have a separate DSL for building up the expression tree, then compile this down to the optimal set of underlying operations? Of course this depends all on how ambitious you are. A little ambitious, though mostly I'll be glad to get something working that people find useful :-) Thanks again for your comments Dag! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 01/09/2013 11:49 AM, Mike Anderson wrote: On 4 January 2013 16:00, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no mailto:d.s.seljeb...@astro.uio.no wrote: On 01/04/2013 07:29 AM, Mike Anderson wrote: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) Food for thought: Myself I have vectors that are naturally stored in 2D, matrices that can be naturally stored in 4D and so on (you can't view them that way when doing linear algebra, it's just that the indices can have multiple components) -- I like that NumPy calls everything array; I think vector and matrix are higher-level mathematical concepts. Very interesting. Can I ask what the application is? And is it equivalent from a mathematical perspective to flattening the 2D vectors into very long 1D vectors? For instance, if you are solving an equation for one value per grid point on a 2D or 3D grid. In PDE problems this occurs all the time, though normally the flattening is treated explicitly before one gets to solving the equation, and when not a reshape operation like you say is usually OK (but the very concept for flattening/reshaping is something that's inherent to arrays, not matrices). Chris also mentioned the case where you have lots of small matrices (say, A[i,j,k] is element (i,j) in matrix k), and you want to multiply all matrices by the same vector, or all matrices by different vectors, and so on. - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure Sounds very promising (assuming you can reuse the buffer if the input matrix had no other references and is not used again?). It's very common for NumPy arrays to fill a large chunk of the available memory (think 20-100 GB), so for those users this would need to be coupled with buffer reuse and good diagnostics that help remove references to old generations of a matrix. Yes it should be possible to re-use buffers, though to some extent that would depend on the underlying matrix library implementation. The JVM makes things a bit interesting here - the GC is extremely good but it doesn't play particularly nicely with non-Java native code. My hunch is that you rely on the GC I think you'll get nowhere (though if you're happy to treat 100 MB matrices then that may not matter so much). 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy - I'd be happy with good handling of 100MB matrices right now. Still, if you copy 100 MB every time you assign to a single element, performance won't be stellar to say the least. I don't know Clojure but I'm thinking that an immutable design would be something like b = a but with 1.0 in position (0, 3) c = b + (3.2 in position (3, 4) however you want to express that syntax-wise. Pasting in your other post: On 01/09/2013 11:57 AM, Mike Anderson wrote: On 4 January 2013 16:13, I'm hoping the API will be independent of storage format - i.e. the underlying implementations can store the data any way they like. So the API will be written in terms of abstractions, and the user will have the choice of whatever concrete implementation best fits the specific needs. Sparse matrices, tiled matrices etc. should all be possible options. Has this kind of approach been used much with NumPy? No, NumPy only supports strided arrays. SciPy has sparse matrices using a different API (which is a pain point). Lazy evaluation: The big problem with numpy is that a + b + np.sqrt(c) will first make a temporary result for a + b, rather than doing the whole expression on the fly, which is *very* bad for performance. So if you want immutability, I urge you to consider every operation to build up an expression tree/program, and then either find out the smart points where you interpret that program automatically, or make explicit eval() of an expression tree the default mode. Very interesting. Seems like this could be layered on top though? i.e. have a separate DSL for building up the expression tree, then compile this down to the optimal set of underlying operations? That's what
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Jan 9, 2013 11:35 AM, Mike Anderson mike.r.anderson...@gmail.com wrote: But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? In my case, I have used 32 bit (or lower) arrays due to memory limitations and some significant speedups in certain situations. This was particularly useful when I was preprocessing numerous arrays to especially Boolean data, saved a lot of hd space and I/O. I have used 128 bits when precision was critical, as I was dealing with very small differences. It is also nice to be able to repeat your computation with different precision in order to spot possible numerical instabilities, even if the performance is not great.l David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com wrote: I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? IIUC that work was done on a fork of numpy which has since been abandoned by its authors, so... yeah, numpy itself doesn't have much to offer in this area right now. It could in principle with a bunch of refactoring (ideally not on a fork, since we saw how well that went), but I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 1/9/2013 9:58 AM, Nathaniel Smith wrote: I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. Sure. I'm trying to look at this more from the Clojure end. Is it really better to start from scratch than to attempt a contribution to NumPy that would make it useful to Clojure. Given the amount of work that has gone into making NumPy what it is, it seems a huge project for the Clojure people to hope to produce anything comparable starting from scratch. Thanks, Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com wrote: I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? IIUC that work was done on a fork of numpy which has since been abandoned by its authors, so... yeah, numpy itself doesn't have much to offer in this area right now. It could in principle with a bunch of refactoring (ideally not on a fork, since we saw how well that went), but I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. If I could just point out that the attempt to fork numpy for the .NET work was done back in the subversion days, and there was little-to-no effort to incrementally merge back changes to master, and vice-versa. With git as our repository now, such work may be more feasible. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Wed, Jan 9, 2013 at 2:35 AM, Mike Anderson First -- is this a matrix library, or a general use nd-array library? That will drive your design a great deal. This is very useful context - thanks! I've had opinions in favour of both an nd-array style library and a matrix library. I guess it depends on your use case which one you are more inclined to think in. I'm hoping that it should be possible for the same API to support both, i.e. you should be able to use a 2D array of numbers as a matrix, and vice-versa. sure, but the API can/should be differnent -- in some sense, the numpy matrix object is really just syntactic sugar -- you can use a 2-d array as a matrix, but then you have to explicilty call linear algebra functions to get things like matrix multiplication, etc. and do some hand work to make sure you're got things the right shape -- i.e a column or row vector where called for. tacking on the matrix object helped this, but in practice, it gets tricky to prevent operations from accidentally returning a plan array from operations on a matrix. Also numpy's matrix concept does not include the concept of a row or column vector, just 1XN or NX1 matrixes -- which works OK, but then when you iterate through a vector, you get 1X1 matrixes, rather than scalars -- a bit odd. Anyway, it takes some though to have two clean APIs sharing one core object. not a bad start, but another major strength of numpy is the multiple data types - you may wantt to design that concept in from the start. But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? heterogeneous data types were added relatively recently in numpy, and are great mostly for interacting with other libraries (and some syntactic sugar uses...) that may store data in arrays of structures. But multiple homogenous data types are critical for saving memory, speeding operations, doing integer math when that's really called for, manipulating images, etc, etc. 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy - I'd be happy with good handling of 100MB matrices right now. 100MB is prety darn small these days -- if you're only interested in smallish problems, then you can probably forget about performance issues, and focus on a really nice API. But Im not sure I'd bother with that -- once people start using it, they'll want to use it for big problems! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 01/09/2013 04:41 PM, Benjamin Root wrote: On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote: On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com mailto:alan.is...@gmail.com wrote: I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? IIUC that work was done on a fork of numpy which has since been abandoned by its authors, so... yeah, numpy itself doesn't have much to offer in this area right now. It could in principle with a bunch of refactoring (ideally not on a fork, since we saw how well that went), but I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. If I could just point out that the attempt to fork numpy for the .NET work was done back in the subversion days, and there was little-to-no effort to incrementally merge back changes to master, and vice-versa. With git as our repository now, such work may be more feasible. This is a matter of personal software design taste I guess, so the following is very subjective. I don't think there's anything at all to gain from this. In 2013 (and presumably, the future), a static C or C++ library is IMO fundamentally incompatible with achieving optimal performance. Going through a major refactor simply to end up with something that's no faster and no more flexible than what NumPy is today seems sort of pointless to me. What one wants is to generate ufuncs etc. on the fly using LLVM that are tuned to the specific tiling pattern of a specific operation, not a static C or C++ library (even with C++ meta-programming, the combinatorial explosion kills you if you do it all at compile-time). Granted, one could probably write a C++ library that was more of a compiler, using LLVM to emit code. But that's starting all over so not really relevant to the question of a NumPy refactor. This is how I understand Continuum thinks too, with Numba as a back-end for Blaze. (And Travis also spoke about this in his farewell address.) Finally, Mark Florisson sort of started this with the 'minivect' library last summer which could as a ufunc backend both for Cython and Numba (which for this purpose are different languages), however as I understand it focus is now more on developing Numba directly rather than minivect (which is understandable as that's quicker). Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson I'm hoping the API will be independent of storage format - i.e. the underlying implementations can store the data any way they like. So the API will be written in terms of abstractions, and the user will have the choice of whatever concrete implementation best fits the specific needs. Sparse matrices, tiled matrices etc. should all be possible options. A note about that -- as I think if it, numpy arrays are two things: 1) a python object for working with numbers, in a wide variety of ways 2) a wrapper around a C-array (or data block) that can be used to provide an easyway for Python to interact with C (and Fortran, and...) libraries, etc. As it turns out a LOT of people use numpy for (2) -- what this means is that while you could change the underlying data representation, etc, and keep the same Python API -- such changes would break a lot of non-pure-python code that relies on that data representation. This is a big issue with the numpy-for-PyPy project -- they could write a numpy clone, but it would only be useful for the pure-python stuff. Even then, a number of folks do tricks with numpy arrays in python that rely on the underlying structure. Not sure how all this would play out for Clojure, but it's something to keep in mind. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 9 January 2013 23:09, Alan G Isaac alan.is...@gmail.com wrote: On 1/9/2013 9:58 AM, Nathaniel Smith wrote: I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. Sure. I'm trying to look at this more from the Clojure end. Is it really better to start from scratch than to attempt a contribution to NumPy that would make it useful to Clojure. Given the amount of work that has gone into making NumPy what it is, it seems a huge project for the Clojure people to hope to produce anything comparable starting from scratch. Thanks, Alan Currently I expect that the Clojure community will produce an abstraction / API for matrices / ndarrays that supports multiple implementations. It's fairly idiomatic in Clojure to work in abstractions, and the language offers good tools for making different concrete abstractions work with a common API, so it's less hard to make this work than it might sound. An interface to NumPy could certainly be one of the implementations of this API - I'm sure people would find this very useful given the maturity on NumPy and the need for integration in environments with heterogeneous systems. At the same time, there will be people in the Clojure world who will want to stay 100% on the JVM for certain projects. For them I don't see how NumPy could be used, unless it can be made to run well on Jython perhaps? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson mike.r.anderson...@gmail.com wrote: In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. A few thoughts: We're thinking of a matrix library First -- is this a matrix library, or a general use nd-array library? That will drive your design a great deal. For my part, I came from MATLAB, which started our very focused on matrixes, then extended to be more generally useful. Personally, I found the matrix-focus to get in the way more than help -- in any real code, you're the actual matrix operations are likely to be a tiny fraction of the code. One reason I like numpy is that it is array-first, with secondary support for matrix stuff. That being said, there is the numpy matrix type, and there are those that find it very useful. particularly in teaching situations, though it feels a bit tacked-on, and that does get in the way, so if you want a real matrix object, but also a general purpose array lib, thinking about both up front will be helpful. - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) what is a multi-dimensional matrix? -- is a 3-d something, a stack of matrixes? or something else? (note, numpy lacks this kind of object, but it is sometimes asked for -- i.e a way to do fast matrix multiplication with a lot of small matrixes) I think fast paths for 1-D and 2-D is secondary, though you may want easy paths for those. IN particular, if you want good support for linear algebra (matrixes), then having a clean and natural row vector and column vector would be nice. See the archives of this list for a bunch of discussion about that -- and what the weaknesses are of the numpy matrix object. - Immutability by default, i.e. matrix operations are pure functions that create new matrices. I'd be careful about this -- the purity and predictability is nice, but these days a lot of time is spend allocating and moving memory around -- numpy array's mutability is a major key feature -- indeed, the key issues with performance with numpy surrond the fact that many copies may be made unnecessarily (note, Dag's suggesting of lazy evaluation may mitigate this to some extent). - Support for 64-bit double precision floats only (this is the standard float type in Clojure) not a bad start, but another major strength of numpy is the multiple data types - you may wantt to design that concept in from the start. - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) This ties in to another major strength of numpy -- ndarrays are both powerful python objects, and wrappers around standard C arrays -- that makes it pretty darn easy to interface with external libraries for core computation. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 01/04/2013 07:29 AM, Mike Anderson wrote: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) Food for thought: Myself I have vectors that are naturally stored in 2D, matrices that can be naturally stored in 4D and so on (you can't view them that way when doing linear algebra, it's just that the indices can have multiple components) -- I like that NumPy calls everything array; I think vector and matrix are higher-level mathematical concepts. - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure Sounds very promising (assuming you can reuse the buffer if the input matrix had no other references and is not used again?). It's very common for NumPy arrays to fill a large chunk of the available memory (think 20-100 GB), so for those users this would need to be coupled with buffer reuse and good diagnostics that help remove references to old generations of a matrix. - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Travis Oliphant noted some of his thoughts on this in the recent thread DARPA funding for Blaze and passing the NumPy torch which is a must-read. Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote: On 01/04/2013 07:29 AM, Mike Anderson wrote: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) Food for thought: Myself I have vectors that are naturally stored in 2D, matrices that can be naturally stored in 4D and so on (you can't view them that way when doing linear algebra, it's just that the indices can have multiple components) -- I like that NumPy calls everything array; I think vector and matrix are higher-level mathematical concepts. - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure Sounds very promising (assuming you can reuse the buffer if the input matrix had no other references and is not used again?). It's very common for NumPy arrays to fill a large chunk of the available memory (think 20-100 GB), so for those users this would need to be coupled with buffer reuse and good diagnostics that help remove references to old generations of a matrix. Oh: Depending on your amibitions, it's worth thinking hard about i) storage format, and ii) lazy evaluation. Storage format: The new trend is for more flexible formats than just column-major/row-major, e.g., storing cache-sized n-dimensional tiles. Lazy evaluation: The big problem with numpy is that a + b + np.sqrt(c) will first make a temporary result for a + b, rather than doing the whole expression on the fly, which is *very* bad for performance. So if you want immutability, I urge you to consider every operation to build up an expression tree/program, and then either find out the smart points where you interpret that program automatically, or make explicit eval() of an expression tree the default mode. Of course this depends all on how ambitious you are. It's probably best to have a look at all the projects designed in order to get around NumPy's short-comings: - Blaze (in development, continuum.io) - Theano - Numexpr Related: - HDF chunks - To some degree Cython Dag Sverre - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Travis Oliphant noted some of his thoughts on this in the recent thread DARPA funding for Blaze and passing the NumPy torch which is a must-read. Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Insights / lessons learned from NumPy design
Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a backdoor option to mutate matrices, but that would be unidiomatic in Clojure - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Mike. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion