[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

WeichenXu123 Fri, 24 Nov 2017 00:56:26 -0800

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19621
  
    I checked the failed tests in sparkR. There's some trouble in the failed 
`glm` sparkR tests.
    These tests compare sparkR glm and R-lib glm results on test data "iris", 
but, what's the string indexer order for R-lib glm ? I check the dataset 
"iris", the "Species" column has three value "setosa", "versicolor", 
"virginica", **their frequency are all 50**, and only when `RFormula` index 
them as: "setosa"->2, "versicolor"->0, "virginica"->1, the result will be the 
same with R-lib glm. This is a strange indexer order.
    How to set string indexer order for R-lib glm ?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

Reply via email to