[R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables
Dear Researches, sorry for the easy and common question. I am trying to justify the idea of RandomForest don't require a transformations (e.g. logarithmic) of variables, comparing this non parametrics method with e.g. the linear regressions. In leteruature to study my phenomena i need to apply a logarithmic trasformation to describe my model, but i found RF don't required this approach. Some people could suggest me text or bibliography to study? thanks in advance Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables
Tree based models (such as RF) are invriant to monotonic transformations in the predictor (x) variables, because they only use the ranks of the variables, not their actual values. More specifically, they look for splits that are at the mid-points of unique values. Thus the resulting trees are basically identical regardless of how you transform the x variables. Of course, the only, probably minor, differences is, e.g., mid-points can be different between the original and transformed data. While this doesn't impact the training data, it can impact the prediction on test data (although difference should be slight). Transformation of the response variable is quite another thing. RF needs it just as much as others if the situation calls for it. Cheers, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of gianni lavaredo Sent: Monday, December 05, 2011 1:41 PM To: r-help@r-project.org Subject: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables Dear Researches, sorry for the easy and common question. I am trying to justify the idea of RandomForest don't require a transformations (e.g. logarithmic) of variables, comparing this non parametrics method with e.g. the linear regressions. In leteruature to study my phenomena i need to apply a logarithmic trasformation to describe my model, but i found RF don't required this approach. Some people could suggest me text or bibliography to study? thanks in advance Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables
about the because they only use the ranks of the variables. Using a leave-one-out, in each interaction the the predictor variable ranks change slightly every time RF builds the model, especially for the variables with low importance. Is It correct to justify this because there are random splitting? Thanks in advance Gianni On Mon, Dec 5, 2011 at 7:59 PM, Liaw, Andy andy_l...@merck.com wrote: Tree based models (such as RF) are invriant to monotonic transformations in the predictor (x) variables, because they only use the ranks of the variables, not their actual values. More specifically, they look for splits that are at the mid-points of unique values. Thus the resulting trees are basically identical regardless of how you transform the x variables. Of course, the only, probably minor, differences is, e.g., mid-points can be different between the original and transformed data. While this doesn't impact the training data, it can impact the prediction on test data (although difference should be slight). Transformation of the response variable is quite another thing. RF needs it just as much as others if the situation calls for it. Cheers, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of gianni lavaredo Sent: Monday, December 05, 2011 1:41 PM To: r-help@r-project.org Subject: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables Dear Researches, sorry for the easy and common question. I am trying to justify the idea of RandomForest don't require a transformations (e.g. logarithmic) of variables, comparing this non parametrics method with e.g. the linear regressions. In leteruature to study my phenomena i need to apply a logarithmic trasformation to describe my model, but i found RF don't required this approach. Some people could suggest me text or bibliography to study? thanks in advance Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attach...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables
You should see no differences beyond what you'd get by running RF a second time with a different random number seed. Best, Andy From: gianni lavaredo [mailto:gianni.lavar...@gmail.com] Sent: Monday, December 05, 2011 2:19 PM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables about the because they only use the ranks of the variables. Using a leave-one-out, in each interaction the the predictor variable ranks change slightly every time RF builds the model, especially for the variables with low importance. Is It correct to justify this because there are random splitting? Thanks in advance Gianni On Mon, Dec 5, 2011 at 7:59 PM, Liaw, Andy andy_l...@merck.commailto:andy_l...@merck.com wrote: Tree based models (such as RF) are invriant to monotonic transformations in the predictor (x) variables, because they only use the ranks of the variables, not their actual values. More specifically, they look for splits that are at the mid-points of unique values. Thus the resulting trees are basically identical regardless of how you transform the x variables. Of course, the only, probably minor, differences is, e.g., mid-points can be different between the original and transformed data. While this doesn't impact the training data, it can impact the prediction on test data (although difference should be slight). Transformation of the response variable is quite another thing. RF needs it just as much as others if the situation calls for it. Cheers, Andy -Original Message- From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On Behalf Of gianni lavaredo Sent: Monday, December 05, 2011 1:41 PM To: r-help@r-project.orgmailto:r-help@r-project.org Subject: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables Dear Researches, sorry for the easy and common question. I am trying to justify the idea of RandomForest don't require a transformations (e.g. logarithmic) of variables, comparing this non parametrics method with e.g. the linear regressions. In leteruature to study my phenomena i need to apply a logarithmic trasformation to describe my model, but i found RF don't required this approach. Some people could suggest me text or bibliography to study? thanks in advance Gianni [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.