[GitHub] [spark] zhengruifeng commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees

GitBox Sat, 25 Jan 2020 05:22:25 -0800

zhengruifeng commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl 
Extremely Randomized Trees
URL: https://github.com/apache/spark/pull/27337#issuecomment-578405903
 
 
   @huaxingao Good point, you are right. Scikit-Learn does this exactly:
   ```
                       # Draw a random threshold
                       current.threshold = rand_uniform(min_feature_value,
                                                        max_feature_value,
                                                        random_state)
   ```
   
   In Scikit-Learn, both RF and ET use a greedy method to find an exact 
threshold.
   However, in MLLIB, all tree models are built on binned datasets 
(`treePoints`), to keep in line with other tree models and to minimize the 
change, I perfer to randomly draw a split from splits built at the beginning of 
training. Otherwise, I need to completely impl a new exact threshold finding 
method.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on issue #27337: [SPARK-30545][ML][PYSPARK] Impl Extremely Randomized Trees

Reply via email to