Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19621
I checked the failed tests in sparkR. There's some trouble in the failed
`glm` sparkR tests.
These tests compare sparkR glm and R-lib glm results on test data "iris",
but, what's the string indexer order for R-lib glm ? I check the dataset
"iris", the "Species" column has three value "setosa", "versicolor",
"virginica", **their frequency are all 50**, and only when `RFormula` index
them as: "setosa"->2, "versicolor"->0, "virginica"->1, the result will be the
same with R-lib glm. This is a strange indexer order.
How to set string indexer order for R-lib glm ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]