Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15314
The problem with the classifier tests failing reveals a more fundamental
issue. We tackle the numeric label problem by having each algorithm convert the
label column to `DoubleType`. That means that each algorithm is free to
implement this differently, or do certain things before casting (like calling
`getDouble` on a non-double column) and it becomes intractable to write
reasonable tests for this. Adding a test to set metadata and not set it, is a
band-aid fix. My inclination is to make the cast happen in `Predictor.fit`, so
that _every_ algorithm implements the _same_ code. The dataset passed in then
will **always** have a DoubleType label column.
If we decide to go the route above, we should separate this into its own
JIRA, since this is for just weight columns. ping @BenFradet @jkbradley who
worked on the original PR, does it seem reasonable to do the cast in just one
place, `Predictor.fit`? cc @srowen if you have thoughts as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]