Re: [Jprogramming] Question about least squares with %.

Fraser Jackson Tue, 04 May 2021 02:03:33 -0700

Concerns about using %.  For solution of linear regression  models avoid the 
need in nearly all applications to do some careful diagnostics after the 
calculation.  Regression is one of the main tools I use J for and the simple 
function listed below returns a result and sufficient information  about the 
calculation to enable a range of diagnostic tools to be constructed.  In 
generating the QR decomposition, the process fails if the are linear relations 
amon the variables.   
It is also usefule to have information for generating a range of plots to like 
the scale plot, the line and data points, normal quantile plots for residuals,  
 These are a set of functions which I hope we can add to the stats addons.  
regN  does the core of constructing a result variable.  The x is a vector of 
dependent avariable values, and y is a matrix of regressors.   If you want a 
constant term in the regression you must append that to the data matri 1,.y     
Some simple functions reg  and reg0 deal with a range of argument s which can 
incorporate the variable names of the regressors.


regN =:   4 : 0
assert. isvector x
assert. (#x) = #y
'q r'=. 128!:0 y
rinv =. 128!:1  r
g=. (|:q) +/ . * ,x 
b=. rinv +/ . * g
yest=. q +/ . * g
yres=. x  - yest
xtxinv=. rinv +/ . * |: rinv
s1 =. +/ *: dev yest
s2 =. +/ *: yres
s3 =. +/ *: dev x
ss =. s1,s2,s3
hi =. +/"1 *:q
if. RMEANS
 do.
yest;yres;b;xtxinv;(mean x);(mean y);hi;ss
 else. 
yest;yres;b;xtxinv;x;y;hi;ss
end.
)

-----Original Message-----
From: Programming <programming-boun...@forums.jsoftware.com> On Behalf Of Hauke 
Rehr
Sent: Tuesday, May 4, 2021 8:49 AM
To: programm...@jsoftware.com
Subject: Re: [Jprogramming] Question about least squares with %.

One might also want to trade Householder for Givens, at least in certain 
circumstances. Would be nice to • automatically decide based on the matrix’s 
properties • have a way to set flags before using %.


Am 03.05.21 um 22:25 schrieb Henry Rich:
> Can anyone make a recommendation of one (or maybe two) 
> freely-available codes that I could use in place of the current QR 
> code?  LAPACK would be OK.  I don't know the field & I don't have the 
> time right now to look deeply into it.
> 
> I have used Householder matrices for finding eigenvalues, & I think 
> they would be cache-unfriendly for QR decomposition. Something more 
> block-oriented would be preferable.  But as I said I don't know the field.
> 
> Henry Rich
> 
> On 5/3/2021 3:56 PM, Imre Patyi wrote:
>> Thank you.
>>
>> I have played with implementing a QR algorithm in J, called qrhh for 
>> QR via Householder reflections.
>> Please find it attached along with some tests.
>>
>> The given QR algorithm qrhh solves all the 5 test cases attached 
>> while the built in %. solves none of them with comparable or even 
>> acceptable accuracy.
>>
>> I am more and more convinced that the built-in %. is numerically very 
>> unstable.  It would be better to find a reputable QR implementation 
>> that is numerically stable.  I am sure there are nice implementations 
>> of QR in C freely available that are numerically stable.  Any 
>> mathematically correct QR algorithm can solve small examples, but to 
>> solve examples of useful size, we need to have a numerically stable 
>> algorithm.  Most plain vanilla math algorithms have very little 
>> chance of being numerically stable (i.e., they magnify the roundoff 
>> errors rather than keeping them more or less constant), it is not for 
>> nothing that numerical analysts exist.
>>
>> The rosetta code site has a C implementation with Householder 
>> reflections.
>> I have not tried testing it, but it is likely to beat J's built-in %. 
>> by several orders of magnitude of accuracy if the implementation is 
>> correct.
>> The recursive algorithm given also on the same page of rosetta code 
>> for J (and mentioned in this thread by Raul Miller) is based upon a 
>> clever rewriting of the classical Gram--Schmidt orthogonalization 
>> process, which is well-known to be numerically unstable and every 
>> book on numerical analysis recommends against using it.  While a 
>> numerically unstable algorithm (of which 128!:0 is probably an 
>> example) can be accurate on some examples, it often fails on similar 
>> examples for no clear reason.
>> See Example 3 in the attachment, where
>>
>> NB.      'q r'=: 128!:0 X
>> NB.      >./|,((|:q)(+/ .*)q)-(=/~i.1{$q) NB. 0.999904
>>
>> 128!:0 produces an "orthogonal" matrix Q with Q'Q-I having an entry 
>> nearly 1 (all should be nearly zero).  Here X is a 1000 by 21 real 
>> matrix, with not too large or too small numbers.  When %. solves a 
>> linear least squares problem Ax=b, or QRx=b, it relies on Q'Q=I in 
>> order to solve Rx=Q'b by backward substitution.  If Q'Q is not nearly 
>> I, it ( %. ) is likely to fail badly.
>>
>> https://rosettacode.org/wiki/QR_decomposition#C
>>
>> https://code.jsoftware.com/wiki/Essays/QR_Decomposition
>>
>> https://en.wikipedia.org/wiki/QR_decomposition
>>
>> Book, "Numerical recipes in C".
>>
>> I would like to request that the J maintainers replace the current %. 
>> with a numerically stable one.  QR decomposition is a well entrenched 
>> small program, and it would likely be easy enough to find a suitable 
>> numerically stable replacement.  It would cost you little to fix it.
>>
>> Thank you very much.
>>
>> Yours sincerely,
>> Imre Patyi
>>
>>
>> On Sun, May 2, 2021 at 11:37 PM Henry Rich <henryhr...@gmail.com> wrote:
>>
>>> J does not use long double.
>>>
>>> It is possible that 8.07 used 8087 instructions with 80-bit 
>>> floating-point; I don't remember.
>>>
>>> The QR algorithm in JE is a clever recursive algorithm that Roger 
>>> invented.  It uses +/ . * to do almost all the calculations; 
>>> reciprocals are performed only at the bottom of the recursion, on 
>>> 2x2 or 1x1 matrices.
>>>
>>> At the lower levels of recursion it works with short, wide matrices 
>>> that the blocking matrix-multiply code can't grip efficiently, so I 
>>> added a version of +/ . * that does matrix multiplication via inner 
>>> products two at a time: just the thing for such cases.  The overall 
>>> algorithm is unchanged.
>>>
>>> The differences you report look pretty small to me, and could be 
>>> artifacts of differing orders of summation.
>>>
>>> I have seen references to what Python does, but I don't know what 
>>> its
>>> qr() code is defined to do.  If the matrix is ill-conditioned, 
>>> perhaps some extra processing is performed.
>>>
>>> Henry Rich
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> - For information about J forums see 
>> http://www.jsoftware.com/forums.htm
> 
> 

--
----------------------
mail written using NEO
neo-layout.org

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Question about least squares with %.

Reply via email to