It is the model matrix which is singular, *not* the variable. You are trying to fit a collinear model.
Use alias() to see what is going on. On Fri, 30 May 2003, Thomas Fischer wrote: > Hello, > > I am trying to run a linear regression analysis on my data set. For some > reason most variables are removed due to singularities. > > My linear regression looks this way (I am using only partial data, which > is selected by flags): > > fm<-lm(log(cplex6.time..sec..[flags]) ~ cplex6.cities[flags] + > log(1/features.meanOver.frust[flags]) + > log(1/features.meanOver.minDist[flags]) + > [...] > avg..steps.to.loc..Opt..norm..[flags] + NN.List.opt..tour.max.[flags]) > > As I am using inversion and logarithms I set all data to positiv values, > before running lm(): > > cplex6.time..sec..[cplex6.time..sec..<=0.00001]=0.00001 > features.meanOver.frust[features.meanOver.frust<=0.00001]=0.00001 > features.meanOver.minDist[features.meanOver.minDist<=0.00001]=0.00001 > [...] > features.varOver.varDist[features.varOver.varDist<=0.00001]=0.00001 > > Retrieving the summary of fm, I get the message, that some coefficients > have been removed. No, that they are nor defined, as it says. > [...] > Coefficients: (20 not defined because of singularities) > Estimate Std. Error t > value > (Intercept) 87.2162 44.1148 > 1.977 > log(1/features.meanOver.frust[flags]) -2.5298 0.1515 > -16.702 > log(1/features.meanOver.minDist[flags]) 154.7170 11.3917 > 13.582 > log(1/features.meanOver.quant25Dist[flags]) -943.4625 71.3505 > -13.223 > log(1/features.meanOver.quart1SpanDist[flags]) 776.1049 60.0571 > 12.923 > log(1/features.meanOver.spanDist[flags]) -9.8069 0.1400 > -70.038 > log(1/features.meanOver.varDist[flags]) -11.3211 0.6715 > -16.859 > log(1/features.quant25Over.minDist[flags]) -46.9655 3.1438 > -14.939 > avg..steps.to.loc..Opt..norm..[flags] 0.8324 1.0919 > 0.762 > Pr(>|t|) > (Intercept) 0.0511 . > log(1/features.meanOver.frust[flags]) <2e-16 *** > log(1/features.meanOver.minDist[flags]) <2e-16 *** > log(1/features.meanOver.quant25Dist[flags]) <2e-16 *** > log(1/features.meanOver.quart1SpanDist[flags]) <2e-16 *** > log(1/features.meanOver.spanDist[flags]) <2e-16 *** > log(1/features.meanOver.varDist[flags]) <2e-16 *** > log(1/features.quant25Over.minDist[flags]) <2e-16 *** > avg..steps.to.loc..Opt..norm..[flags] 0.4478 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > [...] > > > The summary of one of the removed coefficients looks like this: That's the summary of the variable, not the coefficient. > > summary(features.spanOver.quart1SpanDist[flags]) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.05584 0.05797 0.06366 0.06311 0.06674 0.07290 > > summary(log(1/features.spanOver.quart1SpanDist[flags])) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 2.619 2.707 2.754 2.767 2.848 2.885 > > The summary of a coefficient that was kept looks this way: > > > summary(features.quant25Over.minDist[flags]) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.001030 0.001030 0.001030 0.001032 0.001030 0.001040 > > summary(log(1/features.quant25Over.minDist[flags])) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 6.869 6.878 6.878 6.877 6.878 6.878 > > So, I don't see the difference. Why has the first coefficient been > removed and the second one kept? > Please help me. > > I'm using R 1.6.2 on a Linux x86 machine. > > Greetings, > Thomas Fischer > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
