[jira] [Updated] (SPARK-23562) RFormula handleInvalid should handle invalid values in non-string columns.

2018-04-10 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-23562:
--
Shepherd: Joseph K. Bradley

> RFormula handleInvalid should handle invalid values in non-string columns.
> --
>
> Key: SPARK-23562
> URL: https://issues.apache.org/jira/browse/SPARK-23562
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Bago Amirbekian
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently when handleInvalid is set to 'keep' or 'skip' this only applies to 
> String fields. Numeric fields that are null will either cause the transformer 
> to fail or might be null in the resulting label column.
> I'm not sure what the semantics of keep might be for numeric columns with 
> null values, but we should be able to at least support skip for these types.
> --> Discussed offline: null values can be converted to NaN values for "keep"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23562) RFormula handleInvalid should handle invalid values in non-string columns.

2018-04-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-23562:
--
Description: 
Currently when handleInvalid is set to 'keep' or 'skip' this only applies to 
String fields. Numeric fields that are null will either cause the transformer 
to fail or might be null in the resulting label column.

I'm not sure what the semantics of keep might be for numeric columns with null 
values, but we should be able to at least support skip for these types.
--> Discussed offline: null values can be converted to NaN values for "keep"

  was:
Currently when handleInvalid is set to 'keep' or 'skip' this only applies to 
String fields. Numeric fields that are null will either cause the transformer 
to fail or might be null in the resulting label column.

I'm not sure what the semantics of keep might be for numeric columns with null 
values, but we should be able to at least support skip for these types.


> RFormula handleInvalid should handle invalid values in non-string columns.
> --
>
> Key: SPARK-23562
> URL: https://issues.apache.org/jira/browse/SPARK-23562
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Bago Amirbekian
>Priority: Major
>
> Currently when handleInvalid is set to 'keep' or 'skip' this only applies to 
> String fields. Numeric fields that are null will either cause the transformer 
> to fail or might be null in the resulting label column.
> I'm not sure what the semantics of keep might be for numeric columns with 
> null values, but we should be able to at least support skip for these types.
> --> Discussed offline: null values can be converted to NaN values for "keep"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org