[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

XXXShao Fri, 08 Sep 2017 14:51:39 -0700

Github user XXXShao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16722#discussion_r137897201
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
    @@ -1002,9 +1018,9 @@ private[spark] object RandomForest extends Logging {
           val numSplits = metadata.numSplits(featureIndex)
     
           // get count for each distinct value
    -      val (valueCountMap, numSamples) = 
featureSamples.foldLeft((Map.empty[Double, Int], 0)) {
    -        case ((m, cnt), x) =>
    -          (m + ((x, m.getOrElse(x, 0) + 1)), cnt + 1)
    +      val (valueCountMap, numSamples) = 
featureSamples.foldLeft((Map.empty[Double, Double], 0.0)) {
    --- End diff --
    
    Hi, thanks for your contribution~ I have a question about considering 
weight info in findSplitsForContinuousFeature here. It looks the continuous 
features will be influenced much more by instance weight because the weight 
part is considered twice: (1)make split (2) calculate impurity. Normally weight 
is only mentioned in impurity calculation part according to limited papers I 
have read. Could you provide some reference you refer here?  And correct me if 
I misunderstand your code. :) Thanks!



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

Reply via email to