Re: [R] Step command failing for lm function
Hi Noah, Are you able to reproduce the example on a smaller dataset? Do you have any strange variable names or I created a 3 x 100 matrix, fit a linear model and step has been running fine (other than bringing my poor netbook to it's knees). It also might be helpful if you could post your session info per the posting guide. You could also try: debug(step). Then run step on your model so you can see what the function does before it exits. Cheers, Josh On Jan 9, 2011, at 23:57, Noah Silverman n...@smartmediacorp.com wrote: Hi, I have a fairly simple linear regression using the lm function. There are about 100 variables and 30,000 rows of data. It runs fine and produces a decent looking R2 value. I'm interested in performing a stepwise variable selection to see if things can be cleaned up a bit. Calling the step function returns ONE iteration (all the variables) and then stops. No errors are reported. Can someone suggest why this might not be working as expected. (Normally this function steps through all the variables to find the best combination.) Thanks! -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Step command failing for lm function
On 10.01.2011 10:13, Joshua Wiley wrote: Hi Noah, Are you able to reproduce the example on a smaller dataset? Do you have any strange variable names or I created a 3 x 100 matrix, fit a linear model and step has been running fine (other than bringing my poor netbook to it's knees). It also might be helpful if you could post your session info per the posting guide. You could also try: debug(step). Then run step on your model so you can see what the function does before it exits. Cheers, Josh On Jan 9, 2011, at 23:57, Noah Silvermann...@smartmediacorp.com wrote: Hi, I have a fairly simple linear regression using the lm function. There are about 100 variables and 30,000 rows of data. It runs fine and produces a decent looking R2 value. I'm interested in performing a stepwise variable selection to see if things can be cleaned up a bit. Calling the step function returns ONE iteration (all the variables) and then stops. No errors are reported. Can you show us both your code and the output as well as the summary of the whole model, please? Uwe Ligges Can someone suggest why this might not be working as expected. (Normally this function steps through all the variables to find the best combination.) Thanks! -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Step command failing for lm function
Hi, Its a lot of data, but here are sum summary stats: l - lm(trainy ~ x) str(x) num [1:31205, 1:48] 0.0975 -0.1987 0.3254 -0.7912 0.0975 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:31205] 5 6 7 8 ... ..$ : NULL - attr(*, names)= chr [1:1497840] a NA NA NA ... summary(x) V1 V2 V3 V4 V5 V6 Min. :-1.679848 Min. :-1.606698 Min. :-1.617491 Min. :-1.6534404 Min. :-0.93052 Min. :-1.66594 1st Qu.:-0.865216 1st Qu.:-0.867430 1st Qu.:-0.875567 1st Qu.:-0.9042894 1st Qu.:-0.67904 1st Qu.:-0.90768 Median : 0.074739 Median :-0.004886 Median :-0.009924 Median : 0.0946436 Median :-0.40504 Median :-0.14942 Mean : 0.000492 Mean :-0.001140 Mean :-0.001563 Mean :-0.0006543 Mean :-0.01372 Mean : 0.01700 3rd Qu.: 0.826709 3rd Qu.: 0.857625 3rd Qu.: 0.855687 3rd Qu.: 0.8438270 3rd Qu.: 0.23305 3rd Qu.: 0.79841 Max. : 1.578680 Max. : 1.596925 Max. : 1.597644 Max. : 1.5930105 Max. : 2.74787 Max. : 2.88363 V7 V8 V9 V10 V11 V12 Min. :-2.84607 Min. :-17.340329 Min. :-5.72374 Min. :-9.088574 Min. :-0.753625 Min. :-9.694224 1st Qu.:-0.69230 1st Qu.: -0.680686 1st Qu.:-0.77093 1st Qu.:-0.484832 1st Qu.:-0.753625 1st Qu.:-0.535022 Median : 0.07690 Median : -0.050236 Median : 0.08103 Median : 0.127993 Median :-0.187126 Median : 0.094031 Mean :-0.01912 Mean : 0.007672 Mean :-0.01086 Mean : 0.004137 Mean : 0.001845 Mean : 0.005425 3rd Qu.: 0.69226 3rd Qu.: 0.643260 3rd Qu.: 0.70906 3rd Qu.: 0.646475 3rd Qu.: 0.232864 3rd Qu.: 0.640222 Max. : 1.76915 Max. : 4.299870 Max. : 3.87579 Max. : 4.307299 Max. : 8.125662 Max. :13.955377 V13 V14 V15 V16V17 V18 Min. :-2.325326 Min. :-1.122704 Min. :-15.78010 Min. :-1.41451 Min. :-2.890895 Min. :-6.48201 1st Qu.:-0.707599 1st Qu.:-0.677653 1st Qu.: 0.10818 1st Qu.:-0.67008 1st Qu.:-0.562810 1st Qu.:-0.65572 Median : 0.022490 Median :-0.249277 Median : 0.29841 Median :-0.24738 Median :-0.068975 Median :-0.01222 Mean : 0.000984 Mean : 0.005968 Mean : -0.01914 Mean :-0.01929 Mean :-0.004446 Mean :-0.04004 3rd Qu.: 0.735969 3rd Qu.: 0.387072 3rd Qu.: 0.38232 3rd Qu.: 0.32839 3rd Qu.: 0.502638 3rd Qu.: 0.59069 Max. : 2.328877 Max. :10.034416 Max. : 1.17948 Max. : 3.66491 Max. : 3.405497 Max. : 3.95314 V19 V20 V21 V22 V23 V24 Min. :-3.4866219 Min. :-53.84720 Min. :-3.872473 Min. :-82.470612 Min. :-0.877362 Min. :-0.9064 1st Qu.:-0.6866883 1st Qu.: -0.57941 1st Qu.:-0.459875 1st Qu.: -0.546812 1st Qu.:-0.556758 1st Qu.:-0.6743 Median : 0.0181297 Median : -0.01640 Median :-0.026090 Median : -0.023271 Median :-0.283361 Median :-0.2101 Mean : 0.0005746 Mean : 0.02152 Mean : 0.001832 Mean : -0.002836 Mean : 0.006677 Mean : 0.0330 3rd Qu.: 0.7036093 3rd Qu.: 0.58834 3rd Qu.: 0.400639 3rd Qu.: 0.501094 3rd Qu.: 0.196238 3rd Qu.: 0.4863 Max. : 3.5553623 Max. : 53.96102 Max. : 5.111946 Max. : 7.022679 Max. :21.385854 Max. :12.3242 V25V26V27 V28V29V30 Min. :-0.88375 Min. :-1.11709 Min. :-1.00780 Min. :-10.7395 Min. :-1.66934 Min. :-1.0292617 1st Qu.:-0.65752 1st Qu.:-0.71563 1st Qu.:-0.70467 1st Qu.: -0.1804 1st Qu.:-0.46190 1st Qu.:-0.6029130 Median :-0.20505 Median :-0.07946 Median :-0.14171 Median : 0.2798 Median :-0.12636 Median :-0.3733405 Mean : 0.03226 Mean : 0.02066 Mean : 0.01787 Mean : -0.0344 Mean : 0.01104 Mean : 0.0004641 3rd Qu.: 0.47365 3rd Qu.: 0.48877 3rd Qu.: 0.42125 3rd Qu.: 0.5117 3rd Qu.: 0.32533 3rd Qu.: 0.0530082 Max. :10.88045 Max. :11.39008 Max. :11.55056 Max. : 1.2400 Max. :76.74103 Max. : 5.4643580 V31V32V33 V34V35V36 Min. :-1.72330 Min. :-2.81647 Min. :-1.22587 Min. :-1.33872 Min. :-0.85680 Min. :-1.84229 1st Qu.:-0.95858 1st Qu.:-0.68389 1st Qu.:-0.79860 1st Qu.:-0.85541 1st Qu.:-0.66622 1st Qu.:-0.81453 Median :-0.19386 Median : 0.07774 Median :-0.18821 Median :-0.18663 Median :-0.37654 Median :-0.25103 Mean : 0.01799 Mean :-0.01678 Mean : 0.01022 Mean :-0.07883 Mean :-0.05283 Mean
Re: [R] Step command failing for lm function
I think I just figured it out. x is a matrix. l - lm(y ~ x) works for generating a model, but fails. (It considers x as a single item to add/remove for step.) Step does work if I use a data.frame foo - cbind(y,x) l - lm(y ~ ., data=foo) Now step(l) works. I guess R doesn't look at the x in the first version to iterate through the different variable. It does, however iterate when the . is used in a formula. On 1/10/11 1:13 AM, Joshua Wiley wrote: Hi Noah, Are you able to reproduce the example on a smaller dataset? Do you have any strange variable names or I created a 3 x 100 matrix, fit a linear model and step has been running fine (other than bringing my poor netbook to it's knees). It also might be helpful if you could post your session info per the posting guide. You could also try: debug(step). Then run step on your model so you can see what the function does before it exits. Cheers, Josh On Jan 9, 2011, at 23:57, Noah Silverman n...@smartmediacorp.com wrote: Hi, I have a fairly simple linear regression using the lm function. There are about 100 variables and 30,000 rows of data. It runs fine and produces a decent looking R2 value. I'm interested in performing a stepwise variable selection to see if things can be cleaned up a bit. Calling the step function returns ONE iteration (all the variables) and then stops. No errors are reported. Can someone suggest why this might not be working as expected. (Normally this function steps through all the variables to find the best combination.) Thanks! -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.