Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-14 Thread Mike Anderson
Just wanted to say a big thanks to everyone in the NumPy community who has
commented on this topic - it's given us a lot to think about and a lot of
good ideas to work into the design!

Best regards,

   Mike.

On 4 January 2013 14:29, Mike Anderson mike.r.anderson...@gmail.com wrote:

 Hello all,

 In the Clojure community there has been some discussion about creating a
 common matrix maths library / API. Currently there are a few different
 fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
 effort to unify them and have a common base on which to build on.

 NumPy has been something of an inspiration for this, so I though I'd ask
 here to see what lessons have been learned.

 We're thinking of a matrix library with roughly the following design
 (subject to change!)
 - Support for multi-dimensional matrices (but with fast paths for 1D
 vectors and 2D matrices as the common cases)
 - Immutability by default, i.e. matrix operations are pure functions that
 create new matrices. There could be a backdoor option to mutate matrices,
 but that would be unidiomatic in Clojure
 - Support for 64-bit double precision floats only (this is the standard
 float type in Clojure)
 - Ability to support multiple different back-end matrix implementations
 (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
 - A full range of matrix operations. Operations would be delegated to back
 end implementations where they are supported, otherwise generic
 implementations could be used.

 Any thoughts on this topic based on the NumPy experience? In particular
 would be very interesting to know:
 - Features in NumPy which proved to be redundant / not worth the effort
 - Features that you wish had been designed in at the start
 - Design decisions that turned out to be a particularly big mistake /
 success

 Would love to hear your insights, any ideas+advice greatly appreciated!

Mike.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-13 Thread Mike Anderson
On 10 January 2013 05:19, Chris Barker - NOAA Federal chris.bar...@noaa.gov
 wrote:

 On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson

  I'm hoping the API will be independent of storage format - i.e. the
  underlying implementations can store the data any way they like. So the
 API
  will be written in terms of abstractions, and the user will have the
 choice
  of whatever concrete implementation best fits the specific needs. Sparse
  matrices, tiled matrices etc. should all be possible options.

 A note about that -- as I think if it, numpy arrays are two things:

 1) a python object for working with numbers, in a wide variety of ways

 2) a wrapper around a C-array (or data block) that can be used to
 provide an easyway for Python to interact with C (and Fortran, and...)
 libraries, etc.

 As it turns out a LOT of people use numpy for (2) -- what this means
 is that while you could change the underlying data representation,
 etc, and keep the same Python API -- such changes would break a lot of
 non-pure-python code that relies on that data representation.

 This is a big issue with the numpy-for-PyPy project -- they could
 write a numpy clone, but it would only be useful for the pure-python
 stuff.

 Even then, a number of folks do tricks with numpy arrays in python
 that rely on the underlying structure.

 Not sure how all this would play out for Clojure, but it's something
 to keep in mind.


Thanks Chris -  this is a really helpful insight.

Trying to translate that into the Clojure world, I think that's roughly
equivalent to the separation between the API (roughly equivalent to the
methods in the ndarray referred to in 1) from the specific implementations
(which will probably include a data block ndarray-style wrapper like 2, but
would also leave open other implementation options).

That way the majority of users can code purely against the API, and they
won't be affected if (when?) the underlying implementation changes. In this
way, they should be able to get the benefits of 2) without building a
direct dependency on it.

Of course, I still expect some users to circumvent the API and build a
dependency on the underlying implementation. Nothing we can do to stop
that, and they may even have good reasons like hardcore performance
optimization. We have to assume at that point they know what they are doing
and are prepared to live with the consequences :-)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Mike Anderson
On 8 January 2013 02:08, Chris Barker - NOAA Federal
chris.bar...@noaa.govwrote:

 On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson
 mike.r.anderson...@gmail.com wrote:
  In the Clojure community there has been some discussion about creating a
  common matrix maths library / API. Currently there are a few different
  fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
  effort to unify them and have a common base on which to build on.
 
  NumPy has been something of an inspiration for this, so I though I'd ask
  here to see what lessons have been learned.

 A few thoughts:

  We're thinking of a matrix library

 First -- is this a matrix library, or a general use nd-array
 library? That will drive your design a great deal. For my part, I came
 from MATLAB, which started our very focused on matrixes, then extended
 to be more generally useful. Personally, I found the matrix-focus to
 get in the way more than help -- in any real code, you're the actual
 matrix operations are likely to be a tiny fraction of the code.

 One reason I like numpy is that it is array-first, with secondary
 support for matrix stuff.

 That being said, there is the numpy matrix type, and there are those
 that find it very useful. particularly in teaching situations, though
 it feels a bit tacked-on, and that does get in the way, so if you
 want a real matrix object, but also a general purpose array lib,
 thinking about both up front will be helpful.


This is very useful context - thanks! I've had opinions in favour of both
an nd-array style library and a matrix library. I guess it depends on your
use case which one you are more inclined to think in.

I'm hoping that it should be possible for the same API to support both,
i.e. you should be able to use a 2D array of numbers as a matrix, and
vice-versa.



  - Support for multi-dimensional matrices (but with fast paths for 1D
 vectors
  and 2D matrices as the common cases)

 what is a multi-dimensional matrix? -- is a 3-d something, a stack of
 matrixes? or something else? (note, numpy lacks this kind of object,
 but it is sometimes asked for -- i.e a way to do fast matrix
 multiplication with a lot of small matrixes)

 I think fast paths for 1-D and 2-D is secondary, though you may want
 easy paths for those. IN particular, if you want good support for
 linear algebra (matrixes), then having a clean and natural row vector
 and  column vector would be nice. See the archives of this list for
 a bunch of discussion about that -- and what the weaknesses are of the
 numpy matrix object.

  - Immutability by default, i.e. matrix operations are pure functions that
  create new matrices.

 I'd be careful about this -- the purity and predictability is nice,
 but these days a lot of time is spend allocating and moving memory
 around -- numpy array's mutability is a major key feature -- indeed,
 the key issues with performance with numpy surrond the fact that many
 copies may be made unnecessarily (note, Dag's suggesting of lazy
 evaluation may mitigate this to some extent).


Interesting and very useful to know. Sounds like we should definitely allow
for mutable arrays / zero-copy operations in that case if that is proving
to be a big bottleneck.



  - Support for 64-bit double precision floats only (this is the standard
  float type in Clojure)

 not a bad start, but another major strength of numpy is the multiple
 data types - you may wantt to design that concept in from the start.


Sounds like good advice and that should be possible to accomodate in the
design.

But I'm curious: what is the main use case for the alternative data types
in NumPy? Is it for columns of data of heterogeneous types? or something
else?



  - Ability to support multiple different back-end matrix implementations
  (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)

 This ties in to another major strength of numpy -- ndarrays are both
 powerful python objects, and wrappers around standard C arrays -- that
 makes it pretty darn easy to interface with external libraries for
 core computation.


Great - good to know we are on the right track with this one.

Thanks Chris for all your comments / suggestions!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Mike Anderson
On 4 January 2013 16:00, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote:

 On 01/04/2013 07:29 AM, Mike Anderson wrote:
  Hello all,
 
  In the Clojure community there has been some discussion about creating a
  common matrix maths library / API. Currently there are a few different
  fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
  effort to unify them and have a common base on which to build on.
 
  NumPy has been something of an inspiration for this, so I though I'd ask
  here to see what lessons have been learned.
 
  We're thinking of a matrix library with roughly the following design
  (subject to change!)
  - Support for multi-dimensional matrices (but with fast paths for 1D
  vectors and 2D matrices as the common cases)

 Food for thought: Myself I have vectors that are naturally stored in 2D,
 matrices that can be naturally stored in 4D and so on (you can't view
 them that way when doing linear algebra, it's just that the indices can
 have multiple components) -- I like that NumPy calls everything array;
 I think vector and matrix are higher-level mathematical concepts.


Very interesting. Can I ask what the application is? And is it equivalent
from a mathematical perspective to flattening the 2D vectors into very long
1D vectors?



  - Immutability by default, i.e. matrix operations are pure functions
  that create new matrices. There could be a backdoor option to mutate
  matrices, but that would be unidiomatic in Clojure

 Sounds very promising (assuming you can reuse the buffer if the input
 matrix had no other references and is not used again?). It's very common
 for NumPy arrays to fill a large chunk of the available memory (think
 20-100 GB), so for those users this would need to be coupled with buffer
 reuse and good diagnostics that help remove references to old
 generations of a matrix.


Yes it should be possible to re-use buffers, though to some extent that
would depend on the underlying matrix library implementation. The JVM makes
things a bit interesting here - the GC is extremely good but it doesn't
play particularly nicely with non-Java native code.

20-100GB is pretty ambitious and I guess reflects the maturity of NumPy -
 I'd be happy with good handling of 100MB matrices right now.



  - Support for 64-bit double precision floats only (this is the standard
  float type in Clojure)
  - Ability to support multiple different back-end matrix implementations
  (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
  - A full range of matrix operations. Operations would be delegated to
  back end implementations where they are supported, otherwise generic
  implementations could be used.
 
  Any thoughts on this topic based on the NumPy experience? In particular
  would be very interesting to know:
  - Features in NumPy which proved to be redundant / not worth the effort
  - Features that you wish had been designed in at the start
  - Design decisions that turned out to be a particularly big mistake /
  success
 
  Would love to hear your insights, any ideas+advice greatly appreciated!

 Travis Oliphant noted some of his thoughts on this in the recent thread
 DARPA funding for Blaze and passing the NumPy torch which is a must-read.


Great link. Thanks for this and all your other comments!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Mike Anderson
On 4 January 2013 16:13, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote:

 On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote:
  On 01/04/2013 07:29 AM, Mike Anderson wrote:
 snip

 Oh: Depending on your amibitions, it's worth thinking hard about i)
 storage format, and ii) lazy evaluation.

 Storage format: The new trend is for more flexible formats than just
 column-major/row-major, e.g., storing cache-sized n-dimensional tiles.


I'm hoping the API will be independent of storage format - i.e. the
underlying implementations can store the data any way they like. So the API
will be written in terms of abstractions, and the user will have the choice
of whatever concrete implementation best fits the specific needs. Sparse
matrices, tiled matrices etc. should all be possible options.

Has this kind of approach been used much with NumPy?



 Lazy evaluation: The big problem with numpy is that a + b + np.sqrt(c)
 will first make a temporary result for a + b, rather than doing the
 whole expression on the fly, which is *very* bad for performance.

 So if you want immutability, I urge you to consider every operation to
 build up an expression tree/program, and then either find out the
 smart points where you interpret that program automatically, or make
 explicit eval() of an expression tree the default mode.


Very interesting. Seems like this could be layered on top though? i.e. have
a separate DSL for building up the expression tree, then compile this down
to the optimal set of underlying operations?



 Of course this depends all on how ambitious you are.


A little ambitious, though mostly I'll be glad to get something working
that people find useful :-)

Thanks again for your comments Dag!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Dag Sverre Seljebotn
On 01/09/2013 11:49 AM, Mike Anderson wrote:
 On 4 January 2013 16:00, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no mailto:d.s.seljeb...@astro.uio.no wrote:

 On 01/04/2013 07:29 AM, Mike Anderson wrote:
   Hello all,
  
   In the Clojure community there has been some discussion about
 creating a
   common matrix maths library / API. Currently there are a few
 different
   fledgeling matrix libraries in Clojure, so it seemed like a
 worthwhile
   effort to unify them and have a common base on which to build on.
  
   NumPy has been something of an inspiration for this, so I though
 I'd ask
   here to see what lessons have been learned.
  
   We're thinking of a matrix library with roughly the following design
   (subject to change!)
   - Support for multi-dimensional matrices (but with fast paths for 1D
   vectors and 2D matrices as the common cases)

 Food for thought: Myself I have vectors that are naturally stored in 2D,
 matrices that can be naturally stored in 4D and so on (you can't view
 them that way when doing linear algebra, it's just that the indices can
 have multiple components) -- I like that NumPy calls everything array;
 I think vector and matrix are higher-level mathematical concepts.


 Very interesting. Can I ask what the application is? And is it
 equivalent from a mathematical perspective to flattening the 2D vectors
 into very long 1D vectors?

For instance, if you are solving an equation for one value per grid 
point on a 2D or 3D grid. In PDE problems this occurs all the time, 
though normally the flattening is treated explicitly before one gets to 
solving the equation, and when not a reshape operation like you say is 
usually OK (but the very concept for flattening/reshaping is something 
that's inherent to arrays, not matrices).

Chris also mentioned the case where you have lots of small matrices 
(say, A[i,j,k] is element (i,j) in matrix k), and you want to multiply 
all matrices by the same vector, or all matrices by different vectors, 
and so on.


   - Immutability by default, i.e. matrix operations are pure functions
   that create new matrices. There could be a backdoor option to
 mutate
   matrices, but that would be unidiomatic in Clojure

 Sounds very promising (assuming you can reuse the buffer if the input
 matrix had no other references and is not used again?). It's very common
 for NumPy arrays to fill a large chunk of the available memory (think
 20-100 GB), so for those users this would need to be coupled with buffer
 reuse and good diagnostics that help remove references to old
 generations of a matrix.


 Yes it should be possible to re-use buffers, though to some extent that
 would depend on the underlying matrix library implementation. The JVM
 makes things a bit interesting here - the GC is extremely good but it
 doesn't play particularly nicely with non-Java native code.

My hunch is that you rely on the GC I think you'll get nowhere (though 
if you're happy to treat 100 MB matrices then that may not matter so much).

 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy
 -  I'd be happy with good handling of 100MB matrices right now.

Still, if you copy 100 MB every time you assign to a single element, 
performance won't be stellar to say the least. I don't know Clojure but 
I'm thinking that an immutable design would be something like

b = a but with 1.0 in position (0, 3)
c = b + (3.2 in position (3, 4)

however you want to express that syntax-wise.

Pasting in your other post:

On 01/09/2013 11:57 AM, Mike Anderson wrote: On 4 January 2013 16:13,  
I'm hoping the API will be independent of storage format - i.e. the
  underlying implementations can store the data any way they like. So the
  API will be written in terms of abstractions, and the user will have the
  choice of whatever concrete implementation best fits the specific needs.
  Sparse matrices, tiled matrices etc. should all be possible options.
 
  Has this kind of approach been used much with NumPy?

No, NumPy only supports strided arrays. SciPy has sparse matrices using 
a different API (which is a pain point).

  Lazy evaluation: The big problem with numpy is that a + b + 
np.sqrt(c)
  will first make a temporary result for a + b, rather than doing the
  whole expression on the fly, which is *very* bad for performance.
 
  So if you want immutability, I urge you to consider every 
operation to
  build up an expression tree/program, and then either find out the
  smart points where you interpret that program automatically, or make
  explicit eval() of an expression tree the default mode.
 
 
  Very interesting. Seems like this could be layered on top though? i.e.
  have a separate DSL for building up the expression tree, then compile
  this down to the optimal set of underlying operations?

That's what 

Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Daπid
On Jan 9, 2013 11:35 AM, Mike Anderson mike.r.anderson...@gmail.com
wrote:
 But I'm curious: what is the main use case for the alternative data types
in NumPy? Is it for columns of data of heterogeneous types? or something
else?
In my case, I have used 32 bit (or lower) arrays due to memory limitations
and some significant speedups in certain situations. This was particularly
useful when I was preprocessing numerous arrays to especially Boolean data,
saved a lot of hd space and I/O. I have used 128 bits when precision was
critical, as I was dealing with very small differences.
It is also nice to be able to repeat your computation with different
precision in order to spot possible numerical instabilities, even if the
performance is not great.l

David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Alan G Isaac
I'm just a Python+NumPy user and not a CS type.
May I ask a naive question on this thread?

Given the work that has (as I understand it) gone into
making NumPy usable as a C library, why is the discussion not
going in a direction like the following:
What changes to the NumPy code base would be required for it
to provide useful ndarray functionality in a C extension
to Clojure?  Is this simply incompatible with the goal that
Clojure compile to JVM byte code?

Thanks,
Alan Isaac
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Nathaniel Smith
On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com wrote:
 I'm just a Python+NumPy user and not a CS type.
 May I ask a naive question on this thread?

 Given the work that has (as I understand it) gone into
 making NumPy usable as a C library, why is the discussion not
 going in a direction like the following:
 What changes to the NumPy code base would be required for it
 to provide useful ndarray functionality in a C extension
 to Clojure?  Is this simply incompatible with the goal that
 Clojure compile to JVM byte code?

IIUC that work was done on a fork of numpy which has since been
abandoned by its authors, so... yeah, numpy itself doesn't have much
to offer in this area right now. It could in principle with a bunch of
refactoring (ideally not on a fork, since we saw how well that went),
but I don't think most happy current numpy users are wishing they
could switch to writing Lisp on the JVM or vice-versa, so I don't
think it's surprising that no-one's jumped up to do this work.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Alan G Isaac
On 1/9/2013 9:58 AM, Nathaniel Smith wrote:
 I don't think most happy current numpy users are wishing they
 could switch to writing Lisp on the JVM or vice-versa, so I don't
 think it's surprising that no-one's jumped up to do this work.


Sure.  I'm trying to look at this more from the Clojure end.
Is it really better to start from scratch than to attempt
a contribution to NumPy that would make it useful to Clojure.
Given the amount of work that has gone into making NumPy
what it is, it seems a huge project for the Clojure people
to hope to produce anything comparable starting from scratch.

Thanks,
Alan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Benjamin Root
On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com wrote:
  I'm just a Python+NumPy user and not a CS type.
  May I ask a naive question on this thread?
 
  Given the work that has (as I understand it) gone into
  making NumPy usable as a C library, why is the discussion not
  going in a direction like the following:
  What changes to the NumPy code base would be required for it
  to provide useful ndarray functionality in a C extension
  to Clojure?  Is this simply incompatible with the goal that
  Clojure compile to JVM byte code?

 IIUC that work was done on a fork of numpy which has since been
 abandoned by its authors, so... yeah, numpy itself doesn't have much
 to offer in this area right now. It could in principle with a bunch of
 refactoring (ideally not on a fork, since we saw how well that went),
 but I don't think most happy current numpy users are wishing they
 could switch to writing Lisp on the JVM or vice-versa, so I don't
 think it's surprising that no-one's jumped up to do this work.


If I could just point out that the attempt to fork numpy for the .NET work
was done back in the subversion days, and there was little-to-no effort to
incrementally merge back changes to master, and vice-versa.  With git as
our repository now, such work may be more feasible.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Chris Barker - NOAA Federal
On Wed, Jan 9, 2013 at 2:35 AM, Mike Anderson

 First -- is this a matrix library, or a general use nd-array
 library? That will drive your design a great deal.

 This is very useful context - thanks! I've had opinions in favour of both an
 nd-array style library and a matrix library. I guess it depends on your use
 case which one you are more inclined to think in.

 I'm hoping that it should be possible for the same API to support both, i.e.
 you should be able to use a 2D array of numbers as a matrix, and vice-versa.

sure, but the API can/should be differnent -- in some sense, the numpy
matrix object is really just syntactic sugar -- you can use a 2-d
array as a matrix, but then you have to explicilty call linear algebra
functions to get things like matrix multiplication, etc. and do some
hand work to make sure you're got things the right shape -- i.e a
column or row vector where called for.

tacking on the matrix object helped this, but in practice, it gets
tricky to prevent operations from accidentally returning a plan array
from operations on a matrix.

Also numpy's matrix concept does not include the concept of  a row or
column vector, just 1XN or NX1 matrixes -- which works OK, but then
when you iterate through a vector, you get 1X1 matrixes, rather than
scalars -- a bit odd.

Anyway, it takes some though to have two clean APIs sharing one core object.

 not a bad start, but another major strength of numpy is the multiple
 data types - you may wantt to design that concept in from the start.

 But I'm curious: what is the main use case for the alternative data types in
 NumPy? Is it for columns of data of heterogeneous types? or something else?

heterogeneous data types were added relatively recently in numpy, and
are great mostly for interacting with other libraries (and some
syntactic sugar uses...) that may store data in arrays of structures.

But multiple homogenous data types are critical for saving memory,
speeding operations, doing integer math when that's really called for,
manipulating images, etc, etc.

 20-100GB is pretty ambitious and I guess reflects the maturity of
 NumPy -  I'd be happy with good handling of 100MB matrices right
 now.

100MB is prety darn small these days -- if you're only interested in
smallish problems, then you can probably forget about performance
issues, and focus on a really nice API. But Im not sure I'd bother
with that -- once people start using it, they'll want to use it for
big problems!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Dag Sverre Seljebotn
On 01/09/2013 04:41 PM, Benjamin Root wrote:


 On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith n...@pobox.com
 mailto:n...@pobox.com wrote:

 On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com
 mailto:alan.is...@gmail.com wrote:
   I'm just a Python+NumPy user and not a CS type.
   May I ask a naive question on this thread?
  
   Given the work that has (as I understand it) gone into
   making NumPy usable as a C library, why is the discussion not
   going in a direction like the following:
   What changes to the NumPy code base would be required for it
   to provide useful ndarray functionality in a C extension
   to Clojure?  Is this simply incompatible with the goal that
   Clojure compile to JVM byte code?

 IIUC that work was done on a fork of numpy which has since been
 abandoned by its authors, so... yeah, numpy itself doesn't have much
 to offer in this area right now. It could in principle with a bunch of
 refactoring (ideally not on a fork, since we saw how well that went),
 but I don't think most happy current numpy users are wishing they
 could switch to writing Lisp on the JVM or vice-versa, so I don't
 think it's surprising that no-one's jumped up to do this work.


 If I could just point out that the attempt to fork numpy for the .NET
 work was done back in the subversion days, and there was little-to-no
 effort to incrementally merge back changes to master, and vice-versa.
 With git as our repository now, such work may be more feasible.

This is a matter of personal software design taste I guess, so the 
following is very subjective.

I don't think there's anything at all to gain from this.  In 2013 (and 
presumably, the future), a static C or C++ library is IMO fundamentally 
incompatible with achieving optimal performance. Going through a major 
refactor simply to end up with something that's no faster and no more 
flexible than what NumPy is today seems sort of pointless to me.

What one wants is to generate ufuncs etc. on the fly using LLVM that are 
tuned to the specific tiling pattern of a specific operation, not a 
static C or C++ library (even with C++ meta-programming, the 
combinatorial explosion kills you if you do it all at compile-time).

Granted, one could probably write a C++ library that was more of a 
compiler, using LLVM to emit code. But that's starting all over so not 
really relevant to the question of a NumPy refactor.

This is how I understand Continuum thinks too, with Numba as a back-end 
for Blaze. (And Travis also spoke about this in his farewell address.)

Finally, Mark Florisson sort of started this with the 'minivect' library 
last summer which could as a ufunc backend both for Cython and Numba 
(which for this purpose are different languages), however as I 
understand it focus is now more on developing Numba directly rather than 
minivect (which is understandable as that's quicker).

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Chris Barker - NOAA Federal
On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson

 I'm hoping the API will be independent of storage format - i.e. the
 underlying implementations can store the data any way they like. So the API
 will be written in terms of abstractions, and the user will have the choice
 of whatever concrete implementation best fits the specific needs. Sparse
 matrices, tiled matrices etc. should all be possible options.

A note about that -- as I think if it, numpy arrays are two things:

1) a python object for working with numbers, in a wide variety of ways

2) a wrapper around a C-array (or data block) that can be used to
provide an easyway for Python to interact with C (and Fortran, and...)
libraries, etc.

As it turns out a LOT of people use numpy for (2) -- what this means
is that while you could change the underlying data representation,
etc, and keep the same Python API -- such changes would break a lot of
non-pure-python code that relies on that data representation.

This is a big issue with the numpy-for-PyPy project -- they could
write a numpy clone, but it would only be useful for the pure-python
stuff.

Even then, a number of folks do tricks with numpy arrays in python
that rely on the underlying structure.

Not sure how all this would play out for Clojure, but it's something
to keep in mind.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Mike Anderson
On 9 January 2013 23:09, Alan G Isaac alan.is...@gmail.com wrote:

 On 1/9/2013 9:58 AM, Nathaniel Smith wrote:
  I don't think most happy current numpy users are wishing they
  could switch to writing Lisp on the JVM or vice-versa, so I don't
  think it's surprising that no-one's jumped up to do this work.


 Sure.  I'm trying to look at this more from the Clojure end.
 Is it really better to start from scratch than to attempt
 a contribution to NumPy that would make it useful to Clojure.
 Given the amount of work that has gone into making NumPy
 what it is, it seems a huge project for the Clojure people
 to hope to produce anything comparable starting from scratch.

 Thanks,
 Alan


Currently I expect that the Clojure community will produce an abstraction /
API for matrices / ndarrays that supports multiple implementations. It's
fairly idiomatic in Clojure to work in abstractions, and the language
offers good tools for making different concrete abstractions work with a
common API, so it's less hard to make this work than it might sound.

An interface to NumPy could certainly be one of the implementations of this
API - I'm sure people would find this very useful given the maturity on
NumPy and the need for integration in environments
with heterogeneous systems.

At the same time, there will be people in the Clojure world who will want
to stay 100% on the JVM for certain projects. For them I don't see how
NumPy could be used, unless it can be made to run well on Jython perhaps?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-07 Thread Chris Barker - NOAA Federal
On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson
mike.r.anderson...@gmail.com wrote:
 In the Clojure community there has been some discussion about creating a
 common matrix maths library / API. Currently there are a few different
 fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
 effort to unify them and have a common base on which to build on.

 NumPy has been something of an inspiration for this, so I though I'd ask
 here to see what lessons have been learned.

A few thoughts:

 We're thinking of a matrix library

First -- is this a matrix library, or a general use nd-array
library? That will drive your design a great deal. For my part, I came
from MATLAB, which started our very focused on matrixes, then extended
to be more generally useful. Personally, I found the matrix-focus to
get in the way more than help -- in any real code, you're the actual
matrix operations are likely to be a tiny fraction of the code.

One reason I like numpy is that it is array-first, with secondary
support for matrix stuff.

That being said, there is the numpy matrix type, and there are those
that find it very useful. particularly in teaching situations, though
it feels a bit tacked-on, and that does get in the way, so if you
want a real matrix object, but also a general purpose array lib,
thinking about both up front will be helpful.

 - Support for multi-dimensional matrices (but with fast paths for 1D vectors
 and 2D matrices as the common cases)

what is a multi-dimensional matrix? -- is a 3-d something, a stack of
matrixes? or something else? (note, numpy lacks this kind of object,
but it is sometimes asked for -- i.e a way to do fast matrix
multiplication with a lot of small matrixes)

I think fast paths for 1-D and 2-D is secondary, though you may want
easy paths for those. IN particular, if you want good support for
linear algebra (matrixes), then having a clean and natural row vector
and  column vector would be nice. See the archives of this list for
a bunch of discussion about that -- and what the weaknesses are of the
numpy matrix object.

 - Immutability by default, i.e. matrix operations are pure functions that
 create new matrices.

I'd be careful about this -- the purity and predictability is nice,
but these days a lot of time is spend allocating and moving memory
around -- numpy array's mutability is a major key feature -- indeed,
the key issues with performance with numpy surrond the fact that many
copies may be made unnecessarily (note, Dag's suggesting of lazy
evaluation may mitigate this to some extent).

 - Support for 64-bit double precision floats only (this is the standard
 float type in Clojure)

not a bad start, but another major strength of numpy is the multiple
data types - you may wantt to design that concept in from the start.

 - Ability to support multiple different back-end matrix implementations
 (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)

This ties in to another major strength of numpy -- ndarrays are both
powerful python objects, and wrappers around standard C arrays -- that
makes it pretty darn easy to interface with external libraries for
core computation.

HTH,
  -Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-04 Thread Dag Sverre Seljebotn
On 01/04/2013 07:29 AM, Mike Anderson wrote:
 Hello all,

 In the Clojure community there has been some discussion about creating a
 common matrix maths library / API. Currently there are a few different
 fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
 effort to unify them and have a common base on which to build on.

 NumPy has been something of an inspiration for this, so I though I'd ask
 here to see what lessons have been learned.

 We're thinking of a matrix library with roughly the following design
 (subject to change!)
 - Support for multi-dimensional matrices (but with fast paths for 1D
 vectors and 2D matrices as the common cases)

Food for thought: Myself I have vectors that are naturally stored in 2D, 
matrices that can be naturally stored in 4D and so on (you can't view 
them that way when doing linear algebra, it's just that the indices can 
have multiple components) -- I like that NumPy calls everything array; 
I think vector and matrix are higher-level mathematical concepts.

 - Immutability by default, i.e. matrix operations are pure functions
 that create new matrices. There could be a backdoor option to mutate
 matrices, but that would be unidiomatic in Clojure

Sounds very promising (assuming you can reuse the buffer if the input 
matrix had no other references and is not used again?). It's very common 
for NumPy arrays to fill a large chunk of the available memory (think 
20-100 GB), so for those users this would need to be coupled with buffer 
reuse and good diagnostics that help remove references to old 
generations of a matrix.

 - Support for 64-bit double precision floats only (this is the standard
 float type in Clojure)
 - Ability to support multiple different back-end matrix implementations
 (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
 - A full range of matrix operations. Operations would be delegated to
 back end implementations where they are supported, otherwise generic
 implementations could be used.

 Any thoughts on this topic based on the NumPy experience? In particular
 would be very interesting to know:
 - Features in NumPy which proved to be redundant / not worth the effort
 - Features that you wish had been designed in at the start
 - Design decisions that turned out to be a particularly big mistake /
 success

 Would love to hear your insights, any ideas+advice greatly appreciated!

Travis Oliphant noted some of his thoughts on this in the recent thread 
DARPA funding for Blaze and passing the NumPy torch which is a must-read.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-04 Thread Dag Sverre Seljebotn
On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote:
 On 01/04/2013 07:29 AM, Mike Anderson wrote:
 Hello all,

 In the Clojure community there has been some discussion about creating a
 common matrix maths library / API. Currently there are a few different
 fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
 effort to unify them and have a common base on which to build on.

 NumPy has been something of an inspiration for this, so I though I'd ask
 here to see what lessons have been learned.

 We're thinking of a matrix library with roughly the following design
 (subject to change!)
 - Support for multi-dimensional matrices (but with fast paths for 1D
 vectors and 2D matrices as the common cases)

 Food for thought: Myself I have vectors that are naturally stored in 2D,
 matrices that can be naturally stored in 4D and so on (you can't view
 them that way when doing linear algebra, it's just that the indices can
 have multiple components) -- I like that NumPy calls everything array;
 I think vector and matrix are higher-level mathematical concepts.

 - Immutability by default, i.e. matrix operations are pure functions
 that create new matrices. There could be a backdoor option to mutate
 matrices, but that would be unidiomatic in Clojure

 Sounds very promising (assuming you can reuse the buffer if the input
 matrix had no other references and is not used again?). It's very common
 for NumPy arrays to fill a large chunk of the available memory (think
 20-100 GB), so for those users this would need to be coupled with buffer
 reuse and good diagnostics that help remove references to old
 generations of a matrix.

Oh: Depending on your amibitions, it's worth thinking hard about i) 
storage format, and ii) lazy evaluation.

Storage format: The new trend is for more flexible formats than just 
column-major/row-major, e.g., storing cache-sized n-dimensional tiles.

Lazy evaluation: The big problem with numpy is that a + b + np.sqrt(c) 
will first make a temporary result for a + b, rather than doing the 
whole expression on the fly, which is *very* bad for performance.

So if you want immutability, I urge you to consider every operation to 
build up an expression tree/program, and then either find out the 
smart points where you interpret that program automatically, or make 
explicit eval() of an expression tree the default mode.

Of course this depends all on how ambitious you are.

It's probably best to have a look at all the projects designed in order 
to get around NumPy's short-comings:

  - Blaze (in development, continuum.io)
  - Theano
  - Numexpr

Related:

  - HDF chunks
  - To some degree Cython

Dag Sverre


 - Support for 64-bit double precision floats only (this is the standard
 float type in Clojure)
 - Ability to support multiple different back-end matrix implementations
 (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
 - A full range of matrix operations. Operations would be delegated to
 back end implementations where they are supported, otherwise generic
 implementations could be used.

 Any thoughts on this topic based on the NumPy experience? In particular
 would be very interesting to know:
 - Features in NumPy which proved to be redundant / not worth the effort
 - Features that you wish had been designed in at the start
 - Design decisions that turned out to be a particularly big mistake /
 success

 Would love to hear your insights, any ideas+advice greatly appreciated!

 Travis Oliphant noted some of his thoughts on this in the recent thread
 DARPA funding for Blaze and passing the NumPy torch which is a must-read.

 Dag Sverre

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-03 Thread Mike Anderson
Hello all,

In the Clojure community there has been some discussion about creating a
common matrix maths library / API. Currently there are a few different
fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
effort to unify them and have a common base on which to build on.

NumPy has been something of an inspiration for this, so I though I'd ask
here to see what lessons have been learned.

We're thinking of a matrix library with roughly the following design
(subject to change!)
- Support for multi-dimensional matrices (but with fast paths for 1D
vectors and 2D matrices as the common cases)
- Immutability by default, i.e. matrix operations are pure functions that
create new matrices. There could be a backdoor option to mutate matrices,
but that would be unidiomatic in Clojure
- Support for 64-bit double precision floats only (this is the standard
float type in Clojure)
- Ability to support multiple different back-end matrix implementations
(JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
- A full range of matrix operations. Operations would be delegated to back
end implementations where they are supported, otherwise generic
implementations could be used.

Any thoughts on this topic based on the NumPy experience? In particular
would be very interesting to know:
- Features in NumPy which proved to be redundant / not worth the effort
- Features that you wish had been designed in at the start
- Design decisions that turned out to be a particularly big mistake /
success

Would love to hear your insights, any ideas+advice greatly appreciated!

   Mike.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion