[racket-users] Re: Methods for least squares
For the benefit of people on the Racket Users list, we're discussing the Conjugate Gradient (CG) method applied to Least Squares problems, as described in the original paper by Hestenes and Stiefel: http://www.math.purdue.edu/~lucier/jresv49n6p409_A1b.pdf The formulas to use are (10:2) on page 16 of the file (page 424 in the original). (I get only the Racket Users digest, and that information wasn't included there.) On 12/02/2016 01:52 AM, John Clements wrote: Okay, I’ve taken a crack at implementing this… well, I have a toy implementation, that doesn’t do any checking and only works for a particular matrix shape. Short version: yep, it works! Now, a few questions. 1) What should one use as the initial estimate, x_0 ? The method appears to work fine for an initial estimate that is uniformly zero. Is there any reason not to use this? x_0=0 works fine. 2) When does one stop? In exact arithmetic, the algorithm stops when $A^*r_k$, where $r_k$ is the residual at the $k$th step, is the zero vector. In exact arithmetic, this is guaranteed to happen when $k$ is less than or equal to the number of columns of $A$. In floating-point arithmetic, it is not guaranteed that $A^*r_k$ is ever the zero vector. I have not read the paper carefully, but it appears that it’s intended to “halt” in approximately ’n’ steps, where ’n’ is the … number of rows? The number of columns. In the one case I tried, my “direction” vector p_i quickly dropped to something on the order of [1e-16, 1e-16], and in fact it became perfectly stable, in that successive iterations produced precisely the same values for the three iteration variables. However, it wouldn’t surprise me to discover that there were cases on which the answer oscillated between two minutely different values. Can you shed any light on when it’s safe to stop in the general case? If the columns of $A$ are normalized to all have size about 1, then I would stop when $\|A^*r_k\|$ is about round-off size, or when $k$ hits the number of columns, whichever is earlier. In statistical data analysis, many people compute only a few steps, think of the the search direction vectors $p_k$ as "factors" and derive some information from this. With only one RHS vector $b$ this is the same as Partial Least Squares (PLS). 3) Do you have any idea how this technique compares to any described in Numerical Recipes or implemented in LAPACK ? I'm not an expert in numerical linear algebra or least squares problems. It appears that LAPACK uses a so-called $QR$ factorization of $A$ to solve linear least squares problems. I don't know how to compare them. Matlab probably has its own methods. There is an iterative method called LSQR described here: https://web.stanford.edu/class/cme324/paige-saunders2.pdf It is in some sense a more stable version of CG (in exact arithmetic they generate the same sequence of approximations $x_k$), and it is compared to CG in section 7 of that paper. I'm not saying that CG is a particularly good method to solve this problem, just that forming and solving $A^* Ax=A^* b$ is a particularly *bad* method. Brad -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: Methods for least squares
In case you get interested in implementing fitting for non-linear problems as well, consider taking a look at this survey: "Methods for non-linear least squares problems" by K. Madsen, H.B. Nielsen, O. Tingleff. http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf The first section is on descent algorithms. /Jens Axel 2016-12-02 7:52 GMT+01:00 'John Clements' via Racket Users < racket-users@googlegroups.com>: > > > On Dec 1, 2016, at 12:07, Bradley Lucier wrote: > > > > On 12/01/2016 02:04 PM, John Clements wrote: > >> > >> Would it be all right with you if I shared your mail with the mailing > list? A brief reading of this paper shows me the relationship between > solving this problem and approximation of gaussian elimination, and I’d > like to ask Jens Axel Soegaard and Neil Toronto how this relates to the > general algorithms for matrix solution. > > > > Yes, you can share it. > > > > The Conjugate Gradient method is applicable to solving $Ax=b$ when $A$ > is a symmetric, positive definite matrix. It's not applicable to the > problem with general $A$. > > > > So, in fact, it's applicable to solving $A^*Ax=A^*b$, the normal > equations for least squares, since $A^*A$ is symmetric and positive > definite. The brilliance of the method is that it doesn't actually compute > $A^*A$, but only $Ay$ and $A^*y$ for various vectors $y$. > > > > The method is particularly useful when the linear system $Ax=b$ has > inherent error in $A$ and $b$, and so one requires only an approximation to > the solution $x$. > > Okay, I’ve taken a crack at implementing this… well, I have a toy > implementation, that doesn’t do any checking and only works for a > particular matrix shape. > > Short version: yep, it works! > > Now, a few questions. > > 1) What should one use as the initial estimate, x_0 ? The method appears > to work fine for an initial estimate that is uniformly zero. Is there any > reason not to use this? > > 2) When does one stop? I have not read the paper carefully, but it appears > that it’s intended to “halt” in approximately ’n’ steps, where ’n’ is the … > number of rows? In the one case I tried, my “direction” vector p_i quickly > dropped to something on the order of [1e-16, 1e-16], and in fact it became > perfectly stable, in that successive iterations produced precisely the same > values for the three iteration variables. However, it wouldn’t surprise me > to discover that there were cases on which the answer oscillated between > two minutely different values. Can you shed any light on when it’s safe to > stop in the general case? > > 3) Do you have any idea how this technique compares to any described in > Numerical Recipes or implemented in LAPACK ? > > John > > > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- -- Jens Axel Søgaard -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: Methods for least squares
> On Dec 1, 2016, at 12:07, Bradley Lucier wrote: > > On 12/01/2016 02:04 PM, John Clements wrote: >> >> Would it be all right with you if I shared your mail with the mailing list? >> A brief reading of this paper shows me the relationship between solving this >> problem and approximation of gaussian elimination, and I’d like to ask Jens >> Axel Soegaard and Neil Toronto how this relates to the general algorithms >> for matrix solution. > > Yes, you can share it. > > The Conjugate Gradient method is applicable to solving $Ax=b$ when $A$ is a > symmetric, positive definite matrix. It's not applicable to the problem with > general $A$. > > So, in fact, it's applicable to solving $A^*Ax=A^*b$, the normal equations > for least squares, since $A^*A$ is symmetric and positive definite. The > brilliance of the method is that it doesn't actually compute $A^*A$, but only > $Ay$ and $A^*y$ for various vectors $y$. > > The method is particularly useful when the linear system $Ax=b$ has inherent > error in $A$ and $b$, and so one requires only an approximation to the > solution $x$. Okay, I’ve taken a crack at implementing this… well, I have a toy implementation, that doesn’t do any checking and only works for a particular matrix shape. Short version: yep, it works! Now, a few questions. 1) What should one use as the initial estimate, x_0 ? The method appears to work fine for an initial estimate that is uniformly zero. Is there any reason not to use this? 2) When does one stop? I have not read the paper carefully, but it appears that it’s intended to “halt” in approximately ’n’ steps, where ’n’ is the … number of rows? In the one case I tried, my “direction” vector p_i quickly dropped to something on the order of [1e-16, 1e-16], and in fact it became perfectly stable, in that successive iterations produced precisely the same values for the three iteration variables. However, it wouldn’t surprise me to discover that there were cases on which the answer oscillated between two minutely different values. Can you shed any light on when it’s safe to stop in the general case? 3) Do you have any idea how this technique compares to any described in Numerical Recipes or implemented in LAPACK ? John -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.