Does anyone know if there are any special considerations with Random Forest and 
correlated fields or rather derived fields?

For example if we are trying to predict who might leave our company to go work 
for another company some of the variables we may look at are below (in addition 
to others). Do we need to be cautious with comingling these especially since, 
for example with Age variable, all are based on the same variable: birthdate? 
Or rollup fields: Age rolls up to "Age Cohorts" and "Age Cohorts" rolls up to 
"Age Career Cohort"?


-          BIRTHDATE BASED VARIABLES

1.       Age

2.       Age Cohorts (i.e. 20-30, 30-40 yrs old, etc)

3.       Age Career Cohort (similar to above but wider bin i.e ("Early (Age 
<35)", "Mid (Age 35 -49", etc)

4.       Birth year (probably not in R since more than 32 categories)

5.       Generation (i.e. Boomers, Generation X, Y, etc)

#all categorical variables except 'Birth year' and Age

-          Hire Date BASED VARIABLES

6.       Years of Service

7.       Years of service chorts

Or even, for example age and service are correlated (r~.57).?


Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
Strategic Human Resources Management



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to