zhengruifeng opened a new pull request #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees URL: https://github.com/apache/spark/pull/27337 ### What changes were proposed in this pull request? Impl ExtraTrees in ML, the impl is quite similar to RandomForest, two main difference: 1, bootstrap sampling is disabled by default; 2, on each leaf, candidate splits (only one split per feature in Scikit-Learn's impl, refering to [`RandomSplitter`](https://github.com/scikit-learn/scikit-learn/blob/99d3d34d615a7d7b541b24d264b1108238c1953e/sklearn/tree/_splitter.pyx#L591) ) are drawn at random for each feature, and then the best of these randomly-chosen splits is selected. In this PR, I add an expert param `numRandomSplitsPerFeature` to control the number of random splits drawn in each feature. ### Why are the changes needed? 1, Extremely Randomized Trees or ExtraTrees is widely used and impled in Scikit-Learn and OpenCV; 2, ExtraTrees is quite similar to RandomForest, and the main difference lie in that,on each leaf, candidate splits (only one split in Scikit-Learn's impl) are drawn at random for each feature and the best of these randomly-chosen splits is selected. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? added testsuites
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
