zhengruifeng commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees URL: https://github.com/apache/spark/pull/27337#issuecomment-578405903 @huaxingao Good point, you are right. Scikit-Learn does this exactly: ``` # Draw a random threshold current.threshold = rand_uniform(min_feature_value, max_feature_value, random_state) ``` In Scikit-Learn, both RF and ET use a greedy method to find an exact threshold. However, in MLLIB, all tree models are built on binned datasets (`treePoints`), to keep in line with other tree models and to minimize the change, I perfer to randomly draw a split from splits built at the beginning of training. Otherwise, I need to completely impl a new exact threshold finding method.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
