huaxingao commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees URL: https://github.com/apache/spark/pull/27337#issuecomment-578382984 @zhengruifeng I took a quick look of how the random split is picked. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.7485&rep=rep1&type=pdf  It seems to me that in the paper, the random threshold is found by sampling from the continuous uniform distribution [min(feature), max(feature)]. Seems to me that in your implementation, the random threshold is sampled uniformly (and discretely) from the set of possible splits generated by findSplits(). I know your way is simpler, but I am not sure if it is close enough to the original method in the paper.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
