Sounds like you are dealing with missing data problem. At default, lm or glm would only keep observations with complete records (complete case analysis). This can be problematic if you have many missing variables and missing values occur not completely at random (i.e., missing values are dependent on other (un)measured variables or missing values themselves). Imputation is a common tool for handling imcomplete data analysis. In R, you can find packages which conduct single or multiple imputations, e.g. randomForest, norm, mice, mi etc..
No easy way out with missing data problems, all imputations are based on some strong and untestable assumptions. Weidong Gu On Fri, Oct 21, 2011 at 12:13 PM, Rich Shepard <rshep...@appl-ecosys.com> wrote: > Because of regulatory requirement changes over several decades and weather > conditions preventing site access the variables in my data set have > different lengths. I'd like guidance on how to perform linear regressions > and other models with these variables. > > For example, there are 2206 rows for the parameter "TDS" but only 1191 > rows for the parameter "Cond." Such discrepancies are common in these data. > > Is there a reference I can read to learn how to analyze such data? > > Rich > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.