subject:"\[jira\] \[Commented\] \(SPARK\-6162\) Handle missing values in GBM"

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

2018-03-27 Thread Barry Becker (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415600#comment-16415600
 ] 

Barry Becker commented on SPARK-6162:
-

If we all agree that is is something that would be very nice to have, why is it 
closed as won't fix instead of just being deferred to a future release?

This seems like a big limitation of spark Tree models in Spark.

> Handle missing values in GBM
> 
>
> Key: SPARK-6162
> URL: https://issues.apache.org/jira/browse/SPARK-6162
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.2.1
>Reporter: Devesh Parekh
>Priority: Major
>
> We build a lot of predictive models over data combined from multiple sources, 
> where some entries may not have all sources of data and so some values are 
> missing in each feature vector. Another place this might come up is if you 
> have features from slightly heterogeneous items (or items composed of 
> heterogeneous subcomponents) that share many features in common but may have 
> extra features for different types, and you don't want to manually train 
> models for every different type.
> R's GBM library, which is what we are currently using, deals with this type 
> of data nicely by making "missing" nodes in the decision tree (a surrogate 
> split) for features that can have missing values. We'd like to do the same 
> with MLLib, but LabeledPoint would need to support missing values, and 
> GradientBoostedTrees would need to be modified to deal with them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

2016-03-06 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182434#comment-15182434
 ] 

Joseph K. Bradley commented on SPARK-6162:
--

I agree this will be nice to add someday, but it's less pressing than other 
tasks for now.

> Handle missing values in GBM
> 
>
> Key: SPARK-6162
> URL: https://issues.apache.org/jira/browse/SPARK-6162
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.2.1
>Reporter: Devesh Parekh
>
> We build a lot of predictive models over data combined from multiple sources, 
> where some entries may not have all sources of data and so some values are 
> missing in each feature vector. Another place this might come up is if you 
> have features from slightly heterogeneous items (or items composed of 
> heterogeneous subcomponents) that share many features in common but may have 
> extra features for different types, and you don't want to manually train 
> models for every different type.
> R's GBM library, which is what we are currently using, deals with this type 
> of data nicely by making "missing" nodes in the decision tree (a surrogate 
> split) for features that can have missing values. We'd like to do the same 
> with MLLib, but LabeledPoint would need to support missing values, and 
> GradientBoostedTrees would need to be modified to deal with them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

2 matches

Site Navigation

Mail list logo

Footer information