[R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread gianni lavaredo
Dear Researches,

sorry for the easy and common question. I am trying to justify the idea of
RandomForest don't require a transformations (e.g. logarithmic) of
variables, comparing this non parametrics method with e.g. the linear
regressions. In leteruature to study my phenomena i need to apply a
logarithmic trasformation to describe my model, but i found RF don't
required this approach. Some people could suggest me text or bibliography
to study?

thanks in advance

Gianni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread Liaw, Andy
Tree based models (such as RF) are invriant to monotonic transformations in the 
predictor (x) variables, because they only use the ranks of the variables, not 
their actual values.  More specifically, they look for splits that are at the 
mid-points of unique values.  Thus the resulting trees are basically identical 
regardless of how you transform the x variables.

Of course, the only, probably minor, differences is, e.g., mid-points can be 
different between the original and transformed data.  While this doesn't impact 
the training data, it can impact the prediction on test data (although 
difference should be slight).

Transformation of the response variable is quite another thing.  RF needs it 
just as much as others if the situation calls for it.

Cheers,
Andy
 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of gianni lavaredo
 Sent: Monday, December 05, 2011 1:41 PM
 To: r-help@r-project.org
 Subject: [R] explanation why RandomForest don't require a 
 transformations (e.g. logarithmic) of variables
 
 Dear Researches,
 
 sorry for the easy and common question. I am trying to 
 justify the idea of
 RandomForest don't require a transformations (e.g. logarithmic) of
 variables, comparing this non parametrics method with e.g. the linear
 regressions. In leteruature to study my phenomena i need to apply a
 logarithmic trasformation to describe my model, but i found RF don't
 required this approach. Some people could suggest me text or 
 bibliography
 to study?
 
 thanks in advance
 
 Gianni
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread gianni lavaredo
about the  because they only use the ranks of the variables. Using a
leave-one-out, in each interaction the the predictor variable ranks change
slightly every time RF builds the model, especially for the variables with
low importance. Is It correct to justify this because there are random
splitting?

Thanks in advance
Gianni


On Mon, Dec 5, 2011 at 7:59 PM, Liaw, Andy andy_l...@merck.com wrote:

 Tree based models (such as RF) are invriant to monotonic transformations
 in the predictor (x) variables, because they only use the ranks of the
 variables, not their actual values.  More specifically, they look for
 splits that are at the mid-points of unique values.  Thus the resulting
 trees are basically identical regardless of how you transform the x
 variables.

 Of course, the only, probably minor, differences is, e.g., mid-points can
 be different between the original and transformed data.  While this doesn't
 impact the training data, it can impact the prediction on test data
 (although difference should be slight).

 Transformation of the response variable is quite another thing.  RF needs
 it just as much as others if the situation calls for it.

 Cheers,
 Andy


  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of gianni lavaredo
  Sent: Monday, December 05, 2011 1:41 PM
  To: r-help@r-project.org
  Subject: [R] explanation why RandomForest don't require a
  transformations (e.g. logarithmic) of variables
 
  Dear Researches,
 
  sorry for the easy and common question. I am trying to
  justify the idea of
  RandomForest don't require a transformations (e.g. logarithmic) of
  variables, comparing this non parametrics method with e.g. the linear
  regressions. In leteruature to study my phenomena i need to apply a
  logarithmic trasformation to describe my model, but i found RF don't
  required this approach. Some people could suggest me text or
  bibliography
  to study?
 
  thanks in advance
 
  Gianni
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 Notice:  This e-mail message, together with any attach...{{dropped:16}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread Liaw, Andy
You should see no differences beyond what you'd get by running RF a second time 
with a different random number seed.

Best,
Andy


From: gianni lavaredo [mailto:gianni.lavar...@gmail.com]
Sent: Monday, December 05, 2011 2:19 PM
To: Liaw, Andy
Cc: r-help@r-project.org
Subject: Re: [R] explanation why RandomForest don't require a transformations 
(e.g. logarithmic) of variables

about the  because they only use the ranks of the variables. Using a 
leave-one-out, in each interaction the the predictor variable ranks change 
slightly every time RF builds the model, especially for the variables with low 
importance. Is It correct to justify this because there are random splitting?

Thanks in advance
Gianni


On Mon, Dec 5, 2011 at 7:59 PM, Liaw, Andy 
andy_l...@merck.commailto:andy_l...@merck.com wrote:
Tree based models (such as RF) are invriant to monotonic transformations in the 
predictor (x) variables, because they only use the ranks of the variables, not 
their actual values.  More specifically, they look for splits that are at the 
mid-points of unique values.  Thus the resulting trees are basically identical 
regardless of how you transform the x variables.

Of course, the only, probably minor, differences is, e.g., mid-points can be 
different between the original and transformed data.  While this doesn't impact 
the training data, it can impact the prediction on test data (although 
difference should be slight).

Transformation of the response variable is quite another thing.  RF needs it 
just as much as others if the situation calls for it.

Cheers,
Andy


 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On 
 Behalf Of gianni lavaredo
 Sent: Monday, December 05, 2011 1:41 PM
 To: r-help@r-project.orgmailto:r-help@r-project.org
 Subject: [R] explanation why RandomForest don't require a
 transformations (e.g. logarithmic) of variables

 Dear Researches,

 sorry for the easy and common question. I am trying to
 justify the idea of
 RandomForest don't require a transformations (e.g. logarithmic) of
 variables, comparing this non parametrics method with e.g. the linear
 regressions. In leteruature to study my phenomena i need to apply a
 logarithmic trasformation to describe my model, but i found RF don't
 required this approach. Some people could suggest me text or
 bibliography
 to study?

 thanks in advance

 Gianni

   [[alternative HTML version deleted]]

 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Notice:  This e-mail message, together with any attachme...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.