Re: [sympy] public API for the linear algebra module

Alan Bromborsky Wed, 20 Jul 2011 07:12:28 -0700

On 07/20/2011 06:32 AM, someone wrote:

Hi,

For those who've not read my latest blog, I'm working on
implementing a new structure for the matrices module of sympy.
I faced many decision problems while doing so, like what should be
named what and what should be where.

Should matrix and linear algebra be separated ?

So, here you need to be clear on what you think of as "linear
algebra". I know what the difference is mathematically, but I am not
entirely clear that you are meaning the same thing.

To me, linear algebra is a theoretical way of looking at things.  In
the finite dimensional case, linear operators are equivalent to
matrices.

That's the concise mathematical point. A more praxis related way to look
at this is to use "matrix" for all the basic stuff like storage, etc essentially
just take "matrix" as a data structure with very little additional 
functionality.
And then call things like solve, inverse, the BLAS and LAPACK etc "linalg".

This was more my point of view.

There are two main reasons why you would want to do "linear
algebra" without directly using matrices:
- Infinite dimensional vector spaces.  This is probably not really
much of a consideration in the CAS view, but to be sure, matrix
representations only work in finite dimensional spaces.

Depends on what you want to do and how to get there. Often enough I
wish that a CAS can work on a more abstract point than whats usual today.
(Thus I'm very interested on the topic of "abstract" matrices w/o "data",
this goes in that direction.) But I'm drifting too much off topic now.

- Looking at things with respect to matrices requires you to have a
basis.  This is where I think some stuff that does direct things with
linear operators without the user having to manually use matrices
would be useful.  For example, suppose we had a linear operator F(p) =
p' + 2*p on the set Q[x] of polynomials in x.  This set is infinite
dimensional, but we can restrict ourselves to only polynomials of a
certain degree (depending on the problem), so matrix representations
are useful.  But actually constructing the matrix of this operator is
not trivial.  Fist, you need a basis.  Polynomials have a standard
basis, but in some cases you might want to work in some other basis.
Second, you must rewrite an arbitrary input in terms of this basis.
Third, you must convert this into a Matrix.  It would be awesome if
you could just do something tell SymPy that that is your operator and
it operates over the set, say P2 of polynomials of degree 2 or lower.

And it would also be awesome if sympy could return the matrices given
the operator (and another basis if a default one doesn't fit). :-)

What should be the public API ?

I see two "parts" of a public API. Some operations are more like
member functions while others are more independent of a concrete
matrix and should be implemented as free functions. (This is also
where we go more into the direction of procedural code.)

I also see two parts of the public API, but in a different sense.
There is the user level stuff.  This is pretty straight forward, as we
already basically have something that works, and we just need to adapt
the new stuff to work transparently with it.

Yes and no. I think we should take the opportunity and
streamline the API where it's necessary.

Just a small example, there are three methods for computing
determinants. The "simple" entry point is "det", but then there
is also "det_bareis" which uses another algorithm. Why does this
method exist at user level if "det" has the keyword argument "method"?
And there is even a third method "berkowitz_det" but why are the two
parts in the name switched here? If we really need the specialized
methods, then I as a user would expect "det_berkowitz" ...
But I think that at user level, "det" with a reasonable default
for "method" is just fine.

This only to show why I think we should not just copy the API over.

There is also the API for someone who wants to extend the module, say
by writing a new matrix algorithm or representation.  This is much
less easy to get right.

Thats another important point, yes. I was talking about the user
level API mainly.

But this also depends a little on the question if the matrix M
is mutable here. For mutables, the member functions are better,
but for immutable objects maybe the independent functions are
more natural.

Mutability is an interesting point to raise here, because currently
the mutable Matrix class is not changed by M.T or M.invert().  So
actually the question arises as to exactly which things should be done
mutably.  Personally, I think something should be mutable only if it
saves time or resources.  If you have to basically reconstruct the
Matrix, the operation might as well be immutable, as that is what
people will expect anyway.

Depends if you have more the mathematicians or the programmers view.
But you are right, most of the time matrices should be considered
as immutable at the user level. However, this is not true at lower
levels with most algorithms operating on matrices. And it is also
not true on the numerics side where slice assignment is very common.
(But we leave numerics to numpy anyway.)

And setting a single entry (or a whole row/col) of a matrix should
be possible. I imagine that this could be usefull for studying and
developing numerical methods.

But this would go below your first point in any case.

  I think this will always hold for inverse and for transpose it
depends on the internal data type.  Actually, that is something
else to consider, that you cannot completely ignore the internal data
type, as some operations are inherently more efficient in one way for
one data type than in a completely different way for a different data
type.

That's why we have different storage formats and algorithms.
But we should hide this from the user as much as possible.
Just expose "Matrix" and maybe "SparseMatrix" for special demands
but not the different storage schemes for sparse matrices.

By expose I mean what's visible to the user first. Of course
you can always import more lowlevel stuff ...

Independent of storage, all square matrices should f.e. provide
a way to compute the inverse, even if it's horribly slow.

  The classic example of this is when you have an efficiency
gain by doing things mutably.  This is, e.g., why list.sort()
modifies the list in place.

If we go to rectangular matrices, some of the above functions
are not defined, so the question is how we want to handle this?

And we also get "new" functions, f.e. singular values, pseudo
inverses and all that generalisations. These would then also be
member functions and reduce to the "usual" functions if the matrix
is square. (Or do we want to have also a method to compute the
singular values of a square matrix?)

To show a code snippet:

        def singularvalues(self, ...):
                if not self.is_square():
                        return singularvalues
                else:
                        # Two choices:
                        return eigenvalues
                        # or
                        return NotImplementedError

I would say raise NotImplementedError.  Explicit is definitely better
than implicit in this case.

And not only in this case :-)

If singularvalues() return eigenvalues and eigenvalues() returns singular
values, then it will be impossible for the user to tell which she is getting.

No, depending on the matrix shape and the function name
it would be unambiguous. But it's probably bad design anyway.

Having a rectangular matrix throwing a NIE for .eigenvalues()
is a bit awkward too, isn't it?

And btw, I think NIE gives a wrong impression as it's not
just not implemented but really mathematically undefined.

There is another side to this, which is that functions would pollute
the namespace.  Having functions in the namespace is not an inherently
bad thing.

The question is *which* namespace. We could put this in sympy.linalg
for example. So a user can import with "as" and avoid any clash.

  The main problem is that these functions operate on Matrix
objects, whereas most functions in SymPy operate on Expr objects.  It
depends on the name of course, but somethings like *_solve() and
variants would be very confusing to the user against the normal SymPy
solve() function.  Of course, we could name things matrix_solve(), but
at that point, you might as well just make them methods of Matrix.

This is where strong typing comes in handy ... and solve(eqn, vars) can
coexists with solve(A, b).

  So I actually vote to have almost all if not all matrix operations as
methods to Matrix.

This might get hairy if there is some operation that has more than two
operands, in which case just having a function might make sense.

So this should guide the API design, namely what gives more natural
notation while working with these objects and functions.

Also member functions bring an asymetry to the expression. Members
are good for "unary" operations like trace, det, inv etc. And maybe
for "binary" operations with (very) different operands: A.solve(b)
is not so nice but also not too bad. A.matrixmult(B) is worse because
of this asymetry. And for nary operations member functions will
give a mess I think. (And difficulties in the implementation
of some properties maybe.)

Another important topic is iteration. I think that matrices really
should support iteration. And there are at least three cases which
are useful:

for row in matrix.iterrows():
        ...

for col in matrix.itercols():
        ...

for cell in matrix.iterentries():
        ...

Should Matrix.__iter__ just iterate over the entries, so "for cell in
matrix" is equivalent to "for cell in matrix.iterentries()"?

Oh I just missused this iter*() in a dict style way to make clear
what I'm talking about. The question which of these gets the
"native" __iter__ is still open. I'm not sure if iterating over
columns as vectors is not more important as iterating over single
entries? On the other hand, the most basic iteration in collections
is always over single units ...

And the answer is probably: explicit is better than implicit ...

Or there is an efficient (maybe lazy) transpose operation,
and then if f.e. iterrows() is efficient, itercols() replaced
by M.T.iterrows() is somewhat efficient too? I don't know ...

I think this depends on the internal representation of the Matrix.

Of course.


-- Raoul

You might like to look at the following links -

    http://faculty.luther.edu/~macdonal/laga/

and

    www.garretstar.com/secciones/publications/docs/GJFINPDF.PDF


--
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] public API for the linear algebra module

Reply via email to