[R] Variable selection based on both training and testing data

2012-01-30 Thread Jin Minming
Dear all,

The variable selection in regression is usually determined by the training data 
using AIC or F value, such as stepAIC. Is there some R package that can 
consider both the training and test dataset? For example, I have two separate 
training data and test data. Firstly, a regression model is obtained by using 
training data, and then this model is tested by using test data. This process 
continues in order to find some possible optimal models in terms of RMSE or R2 
for both training and test data. 

Thanks,

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable selection based on both training and testing data

2012-01-30 Thread Liaw, Andy
Variable section is part of the training process-- it chooses the model.  By 
definition, test data is used only for testing (evaluating chosen model).

If you find a package or function that does variable selection on test data, 
run from it!

Best,
Andy 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jin Minming
 Sent: Monday, January 30, 2012 8:14 AM
 To: r-help@r-project.org
 Subject: [R] Variable selection based on both training and 
 testing data
 
 Dear all,
 
 The variable selection in regression is usually determined by 
 the training data using AIC or F value, such as stepAIC. Is 
 there some R package that can consider both the training and 
 test dataset? For example, I have two separate training data 
 and test data. Firstly, a regression model is obtained by 
 using training data, and then this model is tested by using 
 test data. This process continues in order to find some 
 possible optimal models in terms of RMSE or R2 for both 
 training and test data. 
 
 Thanks,
 
 Jim
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable selection based on both training and testing data

2012-01-30 Thread Jin Minming
I do not have enough test data for regression analysis although I know there 
are some statistical regression methods that can be used for small dataset. 
That is why I need build a model firslty using training dataset.

Thanks,

Jim
 

--- On Mon, 30/1/12, Liaw, Andy andy_l...@merck.com wrote:

 From: Liaw, Andy andy_l...@merck.com
 Subject: RE: [R] Variable selection based on both training and testing data
 To: 'Jin Minming' jminm...@yahoo.com, r-help@r-project.org 
 r-help@r-project.org
 Date: Monday, 30 January, 2012, 13:39
 Variable section is part of the
 training process-- it chooses the model.  By
 definition, test data is used only for testing (evaluating
 chosen model).
 
 If you find a package or function that does variable
 selection on test data, run from it!
 
 Best,
 Andy 
 
  -Original Message-
  From: r-help-boun...@r-project.org
 
  [mailto:r-help-boun...@r-project.org]
 On Behalf Of Jin Minming
  Sent: Monday, January 30, 2012 8:14 AM
  To: r-help@r-project.org
  Subject: [R] Variable selection based on both training
 and 
  testing data
  
  Dear all,
  
  The variable selection in regression is usually
 determined by 
  the training data using AIC or F value, such as
 stepAIC. Is 
  there some R package that can consider both the
 training and 
  test dataset? For example, I have two separate training
 data 
  and test data. Firstly, a regression model is obtained
 by 
  using training data, and then this model is tested by
 using 
  test data. This process continues in order to find some
 
  possible optimal models in terms of RMSE or R2 for both
 
  training and test data. 
  
  Thanks,
  
  Jim
  
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
  
 Notice:  This e-mail message, together with any
 attachments, contains
 information of Merck  Co., Inc. (One Merck Drive,
 Whitehouse Station,
 New Jersey, USA 08889), and/or its affiliates Direct contact
 information
 for affiliates is available at 
 http://www.merck.com/contact/contacts.html) that may be
 confidential,
 proprietary copyrighted and/or legally privileged. It is
 intended solely
 for the use of the individual or entity named on this
 message. If you are
 not the intended recipient, and have received this message
 in error,
 please notify us immediately by reply e-mail and then delete
 it from 
 your system.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.