Re: [R] predict: remove columns with new levels automatically

2009-11-25 Thread Peter Ehlers


Andreas Wittmann wrote:

Sorry for my bad description, i don't want get a constructed algorithm without 
own work. i only hoped to get some advice how to do this. i don't want to 
predict any sort of data, i reference only to newdata which variables are the 
same as in the model data. But if factors in the data than i can by possibly 
that the newdata has a level which doesn't exist in the original data.
So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. 
I thought this problem is quite common and i can use an algorithm somebody has already implemented.


best regards

Andreas


If I understand correctly, you want to build a model that
includes at least one factor predictor (say xf with k levels).
Then you want to use this model to predict a response value
when xf takes a _new_ level about which the model knows
nothing. That doesn't make sense to me, so I doubt that
it's a common problem. Introducing a new level for a factor
variable is just like introducing a new variable.

 -Peter Ehlers





 Original-Nachricht 

Datum: Wed, 25 Nov 2009 00:48:59 -0500
Von: David Winsemius dwinsem...@comcast.net
An: Andreas Wittmann andreas_wittm...@gmx.de
CC: r-help@r-project.org
Betreff: Re: [R] predict: remove columns with new levels automatically



On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:


Dear R-users,

in the follwing thread

http://tolstoy.newcastle.edu.au/R/help/03b/3322.html

the problem how to remove rows for predict that contain levels which  
are not in the model.


now i try to do this the other way round and want to remove columns  
(variables) in the model which will be later problematic with new  
levels for prediction.


## example:
set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3),  
rep(C, 3)))

test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)

lm1 - lm(x ~ ., data=training)
## prediction does not work because the variable z has the new level  
D

predict(lm1, test)

## solution: the variable z is removed from the model
## the prediction happens without using the information of variable z
lm2 - lm(x ~ y, data=training)
predict(lm2, test)

How can i autmatically recognice this and calculate according to this?
Let me get this straight. You want us to predict in advance (or more  
accurately design an algorithm that can see into the future and work  
around) any sort of newdata you might later construct


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict: remove columns with new levels automatically

2009-11-25 Thread David Winsemius


On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote:

Sorry for my bad description, i don't want get a constructed  
algorithm without own work. i only hoped to get some advice how to  
do this. i don't want to predict any sort of data, i reference only  
to newdata which variables are the same as in the model data. But if  
factors in the data than i can by possibly that the newdata has a  
level which doesn't exist in the original data.
So i have to compare each factor in the data and in the newdata and  
if the newdata has a levels which is not in the original data and  
drop this variable and do compute the model and prediction again.
I thought this problem is quite common and i can use an algorithm  
somebody has already implemented.


best regards

Andreas

If you use str to look at the lm1 object you will find at the bottom a  
list called x:


lm1$x will show you the factors that were present in variables at the  
time of the model creation

 lm1$x
$z
[1] A B C

New testing scenario good level and bad level:

test - data.frame(x=t-rnorm(2), y=t+rnorm(2), z=c(B, D) )
 lm1 - lm(x ~ ., data=training)
 predict(lm1, subset(test, z %in% lm1$x$z) )  # get prediction for  
good level only

1
0.4225204





 Original-Nachricht 

Datum: Wed, 25 Nov 2009 00:48:59 -0500
Von: David Winsemius dwinsem...@comcast.net
An: Andreas Wittmann andreas_wittm...@gmx.de
CC: r-help@r-project.org
Betreff: Re: [R] predict: remove columns with new levels  
automatically




On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:


Dear R-users,

in the follwing thread

http://tolstoy.newcastle.edu.au/R/help/03b/3322.html

the problem how to remove rows for predict that contain levels which
are not in the model.

now i try to do this the other way round and want to remove columns
(variables) in the model which will be later problematic with new
levels for prediction.

## example:
set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3),
rep(C, 3)))
test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)

lm1 - lm(x ~ ., data=training)
## prediction does not work because the variable z has the new level
D
predict(lm1, test)

## solution: the variable z is removed from the model
## the prediction happens without using the information of  
variable z

lm2 - lm(x ~ y, data=training)
predict(lm2, test)

How can i autmatically recognice this and calculate according to  
this?


Let me get this straight. You want us to predict in advance (or more
accurately design an algorithm that can see into the future and work
around) any sort of newdata you might later construct

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT


--
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict: remove columns with new levels automatically

2009-11-25 Thread Andreas Wittmann

Thank you all for the good advice.

Now i did a fast hack, which does want i was looking for, maybe anyone 
else finds this usefull



set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y,
  z1=c(rep(A, 3), rep(B, 3), rep(C, 3)),
  z2=c(rep(F, 4), rep(G, 5)))
test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z1=D, z2=F)


`predict.drop` - function(f, dat, newdat)
{
 datlev - vector(list, ncol(dat))
 newdatlev - vector(list, ncol(newdat))

 `filllevs` - function(dat, veclev)
 {
   for (j in 1:ncol(dat))
   {
 if (is.factor(dat[,j]))
   veclev[[j]] - levels(dat[,j])
 else
   veclev[[j]] - NULL
   }

   return(veclev)
 }

 datlev - filllevs(dat, datlev)
 newdatlev - filllevs(newdat, newdatlev)

 if (ncol(dat) == ncol(newdat))
 {
   drop - logical(ncol(dat))
   names(drop) - colnames(dat)

   for (j in 1:ncol(dat))
   {
 if (!is.null(datlev[[j]]))
 {
   if (!(newdatlev[[j]] %in% datlev[[j]]))
 drop[j] - TRUE
 }
   }
 }
 else
   stop(dat and newdat must have the same column length!)

 m - lm(formula(f), data=dat[,(1:ncol(dat))[!drop]])
 p - predict(m, newdat)

 return(list(drop=drop, p=p))
}


predict.drop(x ~ ., training, test)


best regards

Andreas




David Winsemius wrote:


On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote:

Sorry for my bad description, i don't want get a constructed 
algorithm without own work. i only hoped to get some advice how to do 
this. i don't want to predict any sort of data, i reference only to 
newdata which variables are the same as in the model data. But if 
factors in the data than i can by possibly that the newdata has a 
level which doesn't exist in the original data.
So i have to compare each factor in the data and in the newdata and 
if the newdata has a levels which is not in the original data and 
drop this variable and do compute the model and prediction again.
I thought this problem is quite common and i can use an algorithm 
somebody has already implemented.


best regards

Andreas

If you use str to look at the lm1 object you will find at the bottom a 
list called x:


lm1$x will show you the factors that were present in variables at the 
time of the model creation

 lm1$x
$z
[1] A B C

New testing scenario good level and bad level:

test - data.frame(x=t-rnorm(2), y=t+rnorm(2), z=c(B, D) )
 lm1 - lm(x ~ ., data=training)
 predict(lm1, subset(test, z %in% lm1$x$z) )  # get prediction for 
good level only

1
0.4225204





 Original-Nachricht 

Datum: Wed, 25 Nov 2009 00:48:59 -0500
Von: David Winsemius dwinsem...@comcast.net
An: Andreas Wittmann andreas_wittm...@gmx.de
CC: r-help@r-project.org
Betreff: Re: [R] predict: remove columns with new levels automatically




On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:


Dear R-users,

in the follwing thread

http://tolstoy.newcastle.edu.au/R/help/03b/3322.html

the problem how to remove rows for predict that contain levels which
are not in the model.

now i try to do this the other way round and want to remove columns
(variables) in the model which will be later problematic with new
levels for prediction.

## example:
set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3),
rep(C, 3)))
test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)

lm1 - lm(x ~ ., data=training)
## prediction does not work because the variable z has the new level
D
predict(lm1, test)

## solution: the variable z is removed from the model
## the prediction happens without using the information of variable z
lm2 - lm(x ~ y, data=training)
predict(lm2, test)

How can i autmatically recognice this and calculate according to this?


Let me get this straight. You want us to predict in advance (or more
accurately design an algorithm that can see into the future and work
around) any sort of newdata you might later construct

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT


--
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict: remove columns with new levels automatically

2009-11-24 Thread Andreas Wittmann

Dear R-users,

in the follwing thread

http://tolstoy.newcastle.edu.au/R/help/03b/3322.html

the problem how to remove rows for predict that contain levels which are 
not in the model.


now i try to do this the other way round and want to remove columns 
(variables) in the model which will be later problematic with new levels 
for prediction.


## example:
set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3), rep(C, 3)))
test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)

lm1 - lm(x ~ ., data=training)
## prediction does not work because the variable z has the new level D
predict(lm1, test)

## solution: the variable z is removed from the model
## the prediction happens without using the information of variable z
lm2 - lm(x ~ y, data=training)
predict(lm2, test)

How can i autmatically recognice this and calculate according to this?

Thanks

Andreas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict: remove columns with new levels automatically

2009-11-24 Thread David Winsemius


On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:


Dear R-users,

in the follwing thread

http://tolstoy.newcastle.edu.au/R/help/03b/3322.html

the problem how to remove rows for predict that contain levels which  
are not in the model.


now i try to do this the other way round and want to remove columns  
(variables) in the model which will be later problematic with new  
levels for prediction.


## example:
set.seed(0)
x - rnorm(9)
y - x + rnorm(9)

training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3),  
rep(C, 3)))

test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)

lm1 - lm(x ~ ., data=training)
## prediction does not work because the variable z has the new level  
D

predict(lm1, test)

## solution: the variable z is removed from the model
## the prediction happens without using the information of variable z
lm2 - lm(x ~ y, data=training)
predict(lm2, test)

How can i autmatically recognice this and calculate according to this?


Let me get this straight. You want us to predict in advance (or more  
accurately design an algorithm that can see into the future and work  
around) any sort of newdata you might later construct


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict: remove columns with new levels automatically

2009-11-24 Thread Andreas Wittmann
Sorry for my bad description, i don't want get a constructed algorithm without 
own work. i only hoped to get some advice how to do this. i don't want to 
predict any sort of data, i reference only to newdata which variables are the 
same as in the model data. But if factors in the data than i can by possibly 
that the newdata has a level which doesn't exist in the original data.
So i have to compare each factor in the data and in the newdata and if the 
newdata has a levels which is not in the original data and drop this variable 
and do compute the model and prediction again. 
I thought this problem is quite common and i can use an algorithm somebody has 
already implemented.

best regards

Andreas




 Original-Nachricht 
 Datum: Wed, 25 Nov 2009 00:48:59 -0500
 Von: David Winsemius dwinsem...@comcast.net
 An: Andreas Wittmann andreas_wittm...@gmx.de
 CC: r-help@r-project.org
 Betreff: Re: [R] predict: remove columns with new levels automatically

 
 On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:
 
  Dear R-users,
 
  in the follwing thread
 
  http://tolstoy.newcastle.edu.au/R/help/03b/3322.html
 
  the problem how to remove rows for predict that contain levels which  
  are not in the model.
 
  now i try to do this the other way round and want to remove columns  
  (variables) in the model which will be later problematic with new  
  levels for prediction.
 
  ## example:
  set.seed(0)
  x - rnorm(9)
  y - x + rnorm(9)
 
  training - data.frame(x=x, y=y, z=c(rep(A, 3), rep(B, 3),  
  rep(C, 3)))
  test - data.frame(x=t-rnorm(1), y=t+rnorm(1), z=D)
 
  lm1 - lm(x ~ ., data=training)
  ## prediction does not work because the variable z has the new level  
  D
  predict(lm1, test)
 
  ## solution: the variable z is removed from the model
  ## the prediction happens without using the information of variable z
  lm2 - lm(x ~ y, data=training)
  predict(lm2, test)
 
  How can i autmatically recognice this and calculate according to this?
 
 Let me get this straight. You want us to predict in advance (or more  
 accurately design an algorithm that can see into the future and work  
 around) any sort of newdata you might later construct
 
 --
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

-- 
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.