[racket-users] Re: Methods for least squares

2016-12-02 Thread Bradley Lucier
For the benefit of people on the Racket Users list, we're discussing the 
Conjugate Gradient (CG) method applied to Least Squares problems, as 
described in the original paper by Hestenes and Stiefel:


http://www.math.purdue.edu/~lucier/jresv49n6p409_A1b.pdf

The formulas to use are (10:2) on page 16 of the file (page 424 in the 
original).  (I get only the Racket Users digest, and that information 
wasn't included there.)


On 12/02/2016 01:52 AM, John Clements wrote:


Okay, I’ve taken a crack at implementing this… well, I have a toy 
implementation, that doesn’t do any checking and only works for a particular 
matrix shape.

Short version: yep, it works!

Now, a few questions.

1) What should one use as the initial estimate, x_0 ? The method appears to 
work fine for an initial estimate that is uniformly zero. Is there any reason 
not to use this?


x_0=0 works fine.


2) When does one stop?


In exact arithmetic, the algorithm stops when $A^*r_k$, where $r_k$ is 
the residual at the $k$th step, is the zero vector.


In exact arithmetic, this is guaranteed to happen when $k$ is less than 
or equal to the number of columns of $A$.


In floating-point arithmetic, it is not guaranteed that $A^*r_k$ is ever 
the zero vector.



I have not read the paper carefully, but it appears that it’s intended to 
“halt” in approximately ’n’ steps, where ’n’ is the … number of rows?


The number of columns.


In the one case I tried, my “direction” vector p_i quickly dropped to something 
on the order of [1e-16, 1e-16], and in fact it became perfectly stable, in that 
successive iterations produced precisely the same values for the three 
iteration variables. However, it wouldn’t surprise me to discover that there 
were cases on which the answer oscillated between two minutely different 
values. Can you shed any light on when it’s safe to stop in the general case?


If the columns of $A$ are normalized to all have size about 1, then I 
would stop when $\|A^*r_k\|$ is about round-off size, or when $k$ hits 
the number of columns, whichever is earlier.


In statistical data analysis, many people compute only a few steps, 
think of the the search direction vectors $p_k$ as "factors" and derive 
some information from this.  With only one RHS vector $b$ this is the 
same as Partial Least Squares (PLS).



3) Do you have any idea how this technique compares to any described in 
Numerical Recipes or implemented in LAPACK ?


I'm not an expert in numerical linear algebra or least squares problems. 
 It appears that LAPACK uses a so-called $QR$ factorization of $A$ to 
solve linear least squares problems.  I don't know how to compare them.


Matlab probably has its own methods.

There is an iterative method called LSQR described here:

https://web.stanford.edu/class/cme324/paige-saunders2.pdf

It is in some sense a more stable version of CG (in exact arithmetic 
they generate the same sequence of approximations $x_k$), and it is 
compared to CG in section 7 of that paper.


I'm not saying that CG is a particularly good method to solve this 
problem, just that forming and solving $A^* Ax=A^* b$ is a particularly 
*bad* method.


Brad

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Re: Methods for least squares

2016-12-02 Thread Jens Axel Søgaard
In case you get interested in implementing fitting for non-linear problems
as well, consider taking a look at this survey: "Methods for non-linear
least squares problems" by K. Madsen, H.B. Nielsen, O. Tingleff.


http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf

The first section is on descent algorithms.

/Jens Axel


2016-12-02 7:52 GMT+01:00 'John Clements' via Racket Users <
racket-users@googlegroups.com>:

>
> > On Dec 1, 2016, at 12:07, Bradley Lucier  wrote:
> >
> > On 12/01/2016 02:04 PM, John Clements wrote:
> >>
> >> Would it be all right with you if I shared your mail with the mailing
> list? A brief reading of this paper shows me the relationship between
> solving this problem and approximation of gaussian elimination, and I’d
> like to ask Jens Axel Soegaard and Neil Toronto how this relates to the
> general algorithms for matrix solution.
> >
> > Yes, you can share it.
> >
> > The Conjugate Gradient method is applicable to solving $Ax=b$ when $A$
> is a symmetric, positive definite matrix.  It's not applicable to the
> problem with general $A$.
> >
> > So, in fact, it's applicable to solving $A^*Ax=A^*b$, the normal
> equations for least squares, since $A^*A$ is symmetric and positive
> definite.  The brilliance of the method is that it doesn't actually compute
> $A^*A$, but only $Ay$ and $A^*y$ for various vectors $y$.
> >
> > The method is particularly useful when the linear system $Ax=b$ has
> inherent error in $A$ and $b$, and so one requires only an approximation to
> the solution $x$.
>
> Okay, I’ve taken a crack at implementing this… well, I have a toy
> implementation, that doesn’t do any checking and only works for a
> particular matrix shape.
>
> Short version: yep, it works!
>
> Now, a few questions.
>
> 1) What should one use as the initial estimate, x_0 ? The method appears
> to work fine for an initial estimate that is uniformly zero. Is there any
> reason not to use this?
>
> 2) When does one stop? I have not read the paper carefully, but it appears
> that it’s intended to “halt” in approximately ’n’ steps, where ’n’ is the …
> number of rows? In the one case I tried, my “direction” vector p_i quickly
> dropped to something on the order of [1e-16, 1e-16], and in fact it became
> perfectly stable, in that successive iterations produced precisely the same
> values for the three iteration variables. However, it wouldn’t surprise me
> to discover that there were cases on which the answer oscillated between
> two minutely different values. Can you shed any light on when it’s safe to
> stop in the general case?
>
> 3) Do you have any idea how this technique compares to any described in
> Numerical Recipes or implemented in LAPACK ?
>
> John
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
-- 
Jens Axel Søgaard

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Re: Methods for least squares

2016-12-01 Thread 'John Clements' via Racket Users

> On Dec 1, 2016, at 12:07, Bradley Lucier  wrote:
> 
> On 12/01/2016 02:04 PM, John Clements wrote:
>> 
>> Would it be all right with you if I shared your mail with the mailing list? 
>> A brief reading of this paper shows me the relationship between solving this 
>> problem and approximation of gaussian elimination, and I’d like to ask Jens 
>> Axel Soegaard and Neil Toronto how this relates to the general algorithms 
>> for matrix solution.
> 
> Yes, you can share it.
> 
> The Conjugate Gradient method is applicable to solving $Ax=b$ when $A$ is a 
> symmetric, positive definite matrix.  It's not applicable to the problem with 
> general $A$.
> 
> So, in fact, it's applicable to solving $A^*Ax=A^*b$, the normal equations 
> for least squares, since $A^*A$ is symmetric and positive definite.  The 
> brilliance of the method is that it doesn't actually compute $A^*A$, but only 
> $Ay$ and $A^*y$ for various vectors $y$.
> 
> The method is particularly useful when the linear system $Ax=b$ has inherent 
> error in $A$ and $b$, and so one requires only an approximation to the 
> solution $x$.

Okay, I’ve taken a crack at implementing this… well, I have a toy 
implementation, that doesn’t do any checking and only works for a particular 
matrix shape.  

Short version: yep, it works!

Now, a few questions.

1) What should one use as the initial estimate, x_0 ? The method appears to 
work fine for an initial estimate that is uniformly zero. Is there any reason 
not to use this?

2) When does one stop? I have not read the paper carefully, but it appears that 
it’s intended to “halt” in approximately ’n’ steps, where ’n’ is the … number 
of rows? In the one case I tried, my “direction” vector p_i quickly dropped to 
something on the order of [1e-16, 1e-16], and in fact it became perfectly 
stable, in that successive iterations produced precisely the same values for 
the three iteration variables. However, it wouldn’t surprise me to discover 
that there were cases on which the answer oscillated between two minutely 
different values. Can you shed any light on when it’s safe to stop in the 
general case?

3) Do you have any idea how this technique compares to any described in 
Numerical Recipes or implemented in LAPACK ?

John



-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.