Re: [R] predict: remove columns with new levels automatically
Andreas Wittmann wrote: Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data. So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. I thought this problem is quite common and i can use an algorithm somebody has already implemented. best regards Andreas If I understand correctly, you want to build a model that includes at least one factor predictor (say xf with k levels). Then you want to use this model to predict a response value when xf takes a _new_ level about which the model knows nothing. That doesn't make sense to me, so I doubt that it's a common problem. Introducing a new level for a factor variable is just like introducing a new variable. -Peter Ehlers Original-Nachricht Datum: Wed, 25 Nov 2009 00:48:59 -0500 Von: David Winsemius dwinsem...@comcast.net An: Andreas Wittmann andreas_wittm...@gmx.de CC: r-help@r-project.org Betreff: Re: [R] predict: remove columns with new levels automatically On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote: Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict: remove columns with new levels automatically
On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote: Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data. So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. I thought this problem is quite common and i can use an algorithm somebody has already implemented. best regards Andreas If you use str to look at the lm1 object you will find at the bottom a list called x: lm1$x will show you the factors that were present in variables at the time of the model creation lm1$x $z [1] A B C New testing scenario good level and bad level: test - data.frame(x=t-rnorm(2), y=t+rnorm(2), z=c(B, D) ) lm1 - lm(x ~ ., data=training) predict(lm1, subset(test, z %in% lm1$x$z) ) # get prediction for good level only 1 0.4225204 Original-Nachricht Datum: Wed, 25 Nov 2009 00:48:59 -0500 Von: David Winsemius dwinsem...@comcast.net An: Andreas Wittmann andreas_wittm...@gmx.de CC: r-help@r-project.org Betreff: Re: [R] predict: remove columns with new levels automatically On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote: Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct -- David Winsemius, MD Heritage Laboratories West Hartford, CT -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict: remove columns with new levels automatically
Thank you all for the good advice. Now i did a fast hack, which does want i was looking for, maybe anyone else finds this usefull set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z1=c(rep(A, 3), rep(B, 3), rep(C, 3)), z2=c(rep(F, 4), rep(G, 5))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z1=D, z2=F) `predict.drop` - function(f, dat, newdat) { datlev - vector(list, ncol(dat)) newdatlev - vector(list, ncol(newdat)) `filllevs` - function(dat, veclev) { for (j in 1:ncol(dat)) { if (is.factor(dat[,j])) veclev[[j]] - levels(dat[,j]) else veclev[[j]] - NULL } return(veclev) } datlev - filllevs(dat, datlev) newdatlev - filllevs(newdat, newdatlev) if (ncol(dat) == ncol(newdat)) { drop - logical(ncol(dat)) names(drop) - colnames(dat) for (j in 1:ncol(dat)) { if (!is.null(datlev[[j]])) { if (!(newdatlev[[j]] %in% datlev[[j]])) drop[j] - TRUE } } } else stop(dat and newdat must have the same column length!) m - lm(formula(f), data=dat[,(1:ncol(dat))[!drop]]) p - predict(m, newdat) return(list(drop=drop, p=p)) } predict.drop(x ~ ., training, test) best regards Andreas David Winsemius wrote: On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote: Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data. So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. I thought this problem is quite common and i can use an algorithm somebody has already implemented. best regards Andreas If you use str to look at the lm1 object you will find at the bottom a list called x: lm1$x will show you the factors that were present in variables at the time of the model creation lm1$x $z [1] A B C New testing scenario good level and bad level: test - data.frame(x=t-rnorm(2), y=t+rnorm(2), z=c(B, D) ) lm1 - lm(x ~ ., data=training) predict(lm1, subset(test, z %in% lm1$x$z) ) # get prediction for good level only 1 0.4225204 Original-Nachricht Datum: Wed, 25 Nov 2009 00:48:59 -0500 Von: David Winsemius dwinsem...@comcast.net An: Andreas Wittmann andreas_wittm...@gmx.de CC: r-help@r-project.org Betreff: Re: [R] predict: remove columns with new levels automatically On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote: Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct -- David Winsemius, MD Heritage Laboratories West Hartford, CT -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict: remove columns with new levels automatically
Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Thanks Andreas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict: remove columns with new levels automatically
On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote: Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict: remove columns with new levels automatically
Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data. So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. I thought this problem is quite common and i can use an algorithm somebody has already implemented. best regards Andreas Original-Nachricht Datum: Wed, 25 Nov 2009 00:48:59 -0500 Von: David Winsemius dwinsem...@comcast.net An: Andreas Wittmann andreas_wittm...@gmx.de CC: r-help@r-project.org Betreff: Re: [R] predict: remove columns with new levels automatically On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote: Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x - rnorm(9) y - x + rnorm(9) training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3))) test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D) lm1 - lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level D predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 - lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct -- David Winsemius, MD Heritage Laboratories West Hartford, CT -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.