zhengruifeng commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees URL: https://github.com/apache/spark/pull/27337#issuecomment-578754541 @srowen Yes, the computation of finding best split is not remarkable, since the bottleneck should be the computation of histogram. I just have a idea that do not need to compute the whole histogram in ERT, just draw a random split, and accumulate its left and right impurity. In this way, the communication cost should be much less than RF. However current impl of RF does not support this way. I will try to figure out whether it works and how to reuse impl of RF.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
