[GitHub] [spark] huaxingao commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees

GitBox Fri, 24 Jan 2020 22:52:17 -0800

huaxingao commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely 
Randomized Trees
URL: https://github.com/apache/spark/pull/27337#issuecomment-578382984
 
 
   @zhengruifeng 
   I took a quick look of how the random split is picked.
   
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.7485&rep=rep1&type=pdf
   
   
![image](https://user-images.githubusercontent.com/13592258/73117398-e18b4100-3ef9-11ea-80a6-09f94398e215.png)
   
   
   It seems to me that in the paper, the random threshold is found by sampling 
from the continuous uniform distribution [min(feature), max(feature)]. Seems to 
me that in your implementation, the random threshold is sampled uniformly (and 
discretely) from the set of possible splits generated by findSplits(). I know 
your way is simpler, but I am not sure if it is close enough to the original 
method in the paper.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees

Reply via email to