[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

srowen Sat, 03 Dec 2016 11:36:55 -0800

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16129
  
    @felixcheung maybe you can advise me on this. I think this is a correct 
fix, but ends up changing the results of decision forests a little bit. The 
SparkR test you wrote fails:
    
    ```
    Failed 
-------------------------------------------------------------------------
    1. Failure: spark.randomForest (@test_mllib.R#937) 
-----------------------------
    predictions$prediction not equal to c(...).
    16/16 mismatches (average diff: 0.108)
    [1] 60.3 - 60.4 == -0.0508
    [2] 61.2 - 61.1 ==  0.1272
    [3] 60.7 - 60.6 ==  0.0543
    [4] 62.1 - 62.3 == -0.1473
    [5] 63.5 - 63.7 == -0.2044
    [6] 64.1 - 64.3 == -0.2413
    [7] 65.1 - 64.9 ==  0.2591
    [8] 64.3 - 64.3 ==  0.0045
    [9] 66.7 - 66.7 ==  0.0001
    ...
    ```
    
    Of course I can just paste in the new values, as I expect a small change in 
the result, but wanted to sense-check it. The new answers are closer to the 
answers in the nearly-identical case above with 1 tree, which seems a little 
positive.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

Reply via email to