zhengruifeng opened a new pull request #27337: [SPARK-30545][ML][PYSPARK] Impl 
Extremely Randomized Trees
URL: https://github.com/apache/spark/pull/27337
 
 
   ### What changes were proposed in this pull request?
   Impl ExtraTrees in ML,
   the impl is quite similar to RandomForest, two main difference:
   1, bootstrap sampling is disabled by default;
   2, on each leaf, candidate splits (only one split per feature in 
Scikit-Learn's impl, refering to 
[`RandomSplitter`](https://github.com/scikit-learn/scikit-learn/blob/99d3d34d615a7d7b541b24d264b1108238c1953e/sklearn/tree/_splitter.pyx#L591)
 ) are drawn at random for each feature, and then the best of these 
randomly-chosen splits is selected. In this PR, I add an expert param 
`numRandomSplitsPerFeature` to control the number of random splits drawn in 
each feature.
   
   
   ### Why are the changes needed?
   1, Extremely Randomized Trees or ExtraTrees is widely used and impled in 
Scikit-Learn and OpenCV;
   
   2, ExtraTrees is quite similar to RandomForest, and the main difference lie 
in that,on each leaf, candidate splits (only one split in Scikit-Learn's impl) 
are drawn at random for each feature and the best of these randomly-chosen 
splits is selected.
   
   ### Does this PR introduce any user-facing change?
   Yes
   
   
   ### How was this patch tested?
   added testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to