Assuming you have enough data, usually 1/4 to 1/2 is used for
validation.
One reference would be
Picard, R.R. and Berk, K.N. (1990)
"Data Splitting," The American Statistician, 44;140-147.
hth,
b.
-Original Message-
From: Wensui Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 10:20 PM
To: [EMAIL PROTECTED]
Subject: [R] an off-topic question -> model validation
Currently, I am working on a data mining project and plan to divide
the data table into 2 parts, one for modeling and the other for
validation to compare several models.
But I am not sure about the percentage of data I should use to build
the model and the one I should keep to validate the model.
Is there any literature reference about this topic?
Thank you so much!
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html