Dear Something, I think you have no hope of completing this project without going back to an introductory regression text. The two models you listed (Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk) and (Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c) are identical except for the typos. Cohen and Cohen [1] might be a good place to start.
Best, Ista [1] http://www.amazon.com/Multiple-Regression-Correlation-Analysis-Behavioral/dp/0805822232 On Sat, Feb 13, 2010 at 8:03 PM, Something Something <mailinglist...@gmail.com> wrote: >>>From your question it is difficult to determine what sort of tutoring you > are expecting. > The kind of tutoring you are providing is definitely helpful :) > > I guess I should quickly explain what I am trying to accomplish. I have > been a computer scientist for quite a few years, but I studied Statistics > long time ago. Now trying to get back into the Statistics field. > > For my latest project I have been asked to implement Multiple Regression > Analysis (using interactions - I guess) in Java & Distributed Computing > (more specifically Hadoop). I looked at the possibility of using rJava, but > it's 'single threaded' model plus complex deployment on 1000s of machines > does not make it very appealing. I looked at using 3rd party libraries such > as 'Apache Commons Math' & Flanagan's JAVA statistics library. Both of them > use this formula: > Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk > > and NOT the one I have been asked to implement - which is - > Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c > > (Not to mention these tools do not use BigDecimal so the answers given by > them are not as precise as those from R). > > So now the deadline is approaching... and I can't find any Java library that > matches results from R.. so I have to roll up my sleeves and start coding in > Java - which means I need to get up to speed on Y-Hat calculations very > quickly. Ideally, looking for a paper (or a text book) that gives step by > step instructions on calculating coefficients such as (b0, b1... b13). I > have been searching but haven't found anything. My last resort is to look > at R source code and start converting it to Java. > > I am new to R and a little rusty on Statistics, so I apologize for all the > dumb questions, and GREATLY appreciate your patience and help. Thanks. > > > On Sat, Feb 13, 2010 at 3:20 PM, David Winsemius > <dwinsem...@comcast.net>wrote: > >> >> On Feb 13, 2010, at 5:03 PM, Something Something wrote: >> >> I tried.. >>> >>> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude) >>> formula(mod) >>> >>> This produced.... >>> Y ~ X1 * X2 * X3 >>> >>> >>> When I typed just mod I got: >>> >>> Call: >>> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude) >>> >>> Coefficients: >>> (Intercept) X11 X21 X31 X11:X21 >>> X11:X31 >>> X21:X31 X11:X21:X31 >>> 177.9245 0.2005 2.4482 3.1216 0.8127 -26.6166 >>> -3.0398 29.6049 >>> >>> >>> I am trying to figure out how R computed all these coefficients. >>> >> >> From your question it is difficult to determine what sort of tutoring you >> are expecting. To get the code of an R formula, you just type its name: >> >> lm >> >> Leads to lm.fit: >> >> lm.fit >> >> Reading further it appears the lm and lm.fit functions are really front >> ends for this call: >> >> .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny, >> tol = as.double(tol), coefficients = mat.or.vec(p, ny), >> residuals = y, effects = y, rank = integer(1L), pivot = 1L:p, >> qraux = double(p), work = double(2 * p), PACKAGE = "base") >> >> Seems pretty likely that is a QR decomposition-based method that i >> implemented in compiled code. >> >> So if you want to go deeper, at least you know what to search for. Or if >> you want to know how regression works on a matrix level, you should consult >> a good reference text or Wikipedia, which is surprisingly good for that sort >> of question these days. >> >> -- >> David. >> >>> >>> >>> >>> >>> >>> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter <gunter.ber...@gene.com> >>> wrote: >>> >>> ?formula >>>> >>>> >>>> Bert Gunter >>>> Genentech Nonclinical Statistics >>>> >>>> -----Original Message----- >>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>>> On >>>> Behalf Of Something Something >>>> Sent: Saturday, February 13, 2010 1:24 PM >>>> To: Daniel Nordlund >>>> Cc: r-help@r-project.org >>>> Subject: Re: [R] lm function in R >>>> >>>> Thanks Dan. Yes that was very helpful. I didn't see the change from '*' >>>> to >>>> '+'. >>>> >>>> Seems like when I put * it means - interaction & when I put + it's not an >>>> interaction. >>>> >>>> Is it correct to assume then that... >>>> >>>> When I put + R evaluates the following equation: >>>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk >>>> >>>> >>>> But when I put * R evaluates the following equation; >>>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c >>>> >>>> Is this correct? If it is then can someone point me to any sources that >>>> will explain how the coefficients (such as b0... bk, b12.. , b123..) are >>>> calculated. I guess, one source is the R source code :) but is there any >>>> other documentation anywhere? >>>> >>>> Please let me know. Thanks. >>>> >>>> >>>> >>>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund >>>> <djnordl...@verizon.net>wrote: >>>> >>>> -----Original Message----- >>>>>> From: r-help-boun...@r-project.org [mailto: >>>>>> >>>>> r-help-boun...@r-project.org] >>>> >>>>> On Behalf Of Something Something >>>>>> Sent: Friday, February 12, 2010 5:28 PM >>>>>> To: Phil Spector; r-help@r-project.org >>>>>> Subject: Re: [R] lm function in R >>>>>> >>>>>> Thanks for the replies everyone. Greatly appreciate it. Some >>>>>> >>>>> progress, >>>> >>>>> but >>>>>> now I am getting the following values when I don't use "as.factor" >>>>>> >>>>>> 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667 >>>>>> >>>>>> Is that what you guys get? >>>>>> >>>>> >>>>> >>>>> If you look at Phil's response below, no, that is not what he got. The >>>>> difference is that you are specifying an interaction, whereas Phil did >>>>> >>>> not >>>> >>>>> (because the equation you initially specified did not include an >>>>> interaction. Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula. >>>>> >>>>> >>>>>> >>>>>> On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector >>>>>> <spec...@stat.berkeley.edu>wrote: >>>>>> >>>>>> By converting the two variables to factors, you are fitting >>>>>>> an entirely different model. Leave out the as.factor stuff >>>>>>> and it will work exactly as you want it to. >>>>>>> >>>>>>> dat >>>>>>> >>>>>>>> >>>>>>>> V1 V2 V3 V4 >>>>>>> 1 s1 14 4 1 >>>>>>> 2 s2 23 4 2 >>>>>>> 3 s3 30 7 2 >>>>>>> 4 s4 50 7 4 >>>>>>> 5 s5 39 10 3 >>>>>>> 6 s6 67 10 6 >>>>>>> >>>>>>> names(dat) = c('id','y','x1','x2') >>>>>>>> z = lm(y~x1+x2,dat) >>>>>>>> predict(z) >>>>>>>> >>>>>>>> 1 2 3 4 5 6 15.16667 >>>>>>> >>>>>> 24.66667 >>>> >>>>> 27.66667 46.66667 40.16667 68.66667 >>>>>>> >>>>>>> >>>>>>> - Phil Spector >>>>>>> Statistical Computing >>>>>>> >>>>>> Facility >>>> >>>>> Department of Statistics >>>>>>> UC Berkeley >>>>>>> spec...@stat.berkeley.edu >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 12 Feb 2010, Something Something wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>>> >>>>>>>> I am trying to learn how to perform Multiple Regression Analysis in >>>>>>>> >>>>>>> R. >>>> >>>>> I >>>>>> >>>>>>> decided to take a simple example given in this PDF: >>>>>>>> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf >>>>>>>> >>>>>>>> I created a small CSV called, students.csv that contains the >>>>>>>> >>>>>>> following >>>> >>>>> data: >>>>>>>> >>>>>>>> s1 14 4 1 >>>>>>>> s2 23 4 2 >>>>>>>> s3 30 7 2 >>>>>>>> s4 50 7 4 >>>>>>>> s5 39 10 3 >>>>>>>> s6 67 10 6 >>>>>>>> >>>>>>>> Col headers: Student id, Memory span(Y), age(X1), speech rate(X2) >>>>>>>> >>>>>>>> Now the expected results are: >>>>>>>> >>>>>>>> yHat[0]:15.166666666666668 >>>>>>>> yHat[1]:24.666666666666668 >>>>>>>> yHat[2]:27.666666666666664 >>>>>>>> yHat[3]:46.666666666666664 >>>>>>>> yHat[4]:40.166666666666664 >>>>>>>> yHat[5]:68.66666666666667 >>>>>>>> >>>>>>>> This is based on the following equation (given in the PDF): Y = >>>>>>>> >>>>>>> 1.67 >>>> >>>>> + >>>>> >>>>>> X1 >>>>>> >>>>>>> + >>>>>>>> 9.50 X2 >>>>>>>> >>>>>>>> I ran the following commands in R: >>>>>>>> >>>>>>>> data = read.table("students.csv", head=F, as.is=T, na.string=".", >>>>>>>> row.nam=NULL) >>>>>>>> X1 = as.factor(data[[3]]) >>>>>>>> X2 = as.factor(data[[4]]) >>>>>>>> Y = data[[2]] >>>>>>>> mod = lm(Y ~ X1*X2, na.action = na.exclude) >>>>>>>> Y.hat = fitted(mod) >>>>>>>> Y.hat >>>>>>>> >>>>>>>> This gives me the following output: >>>>>>>> >>>>>>>> Y.hat >>>>>>>> >>>>>>>>> >>>>>>>>> 1 2 3 4 5 6 >>>>>>>> 14 23 30 50 39 67 >>>>>>>> >>>>>>>> Obviously I am doing something wrong. Please help. Thanks. >>>>>>>> >>>>>>>> >>>>> Hope this is helpful, >>>>> >>>>> Dan >>>>> >>>>> Daniel Nordlund >>>>> Bothell, WA USA >>>>> >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.