[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

sethah Thu, 06 Oct 2016 13:24:16 -0700

Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/15314
  
    The problem with the classifier tests failing reveals a more fundamental 
issue. We tackle the numeric label problem by having each algorithm convert the 
label column to `DoubleType`. That means that each algorithm is free to 
implement this differently, or do certain things before casting (like calling 
`getDouble` on a non-double column) and it becomes intractable to write 
reasonable tests for this. Adding a test to set metadata and not set it, is a 
band-aid fix. My inclination is to make the cast happen in `Predictor.fit`, so 
that _every_ algorithm implements the _same_ code. The dataset passed in then 
will **always** have a DoubleType label column. 
    
    If we decide to go the route above, we should separate this into its own 
JIRA, since this is for just weight columns. ping @BenFradet @jkbradley who 
worked on the original PR, does it seem reasonable to do the cast in just one 
place, `Predictor.fit`? cc @srowen if you have thoughts as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

Reply via email to