[jira] [Updated] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)
[ https://issues.apache.org/jira/browse/SPARK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5133: - Target Version/s: 1.5.0 Remaining Estimate: 168h Original Estimate: 168h Feature Importance for Decision Tree (Ensembles) Key: SPARK-5133 URL: https://issues.apache.org/jira/browse/SPARK-5133 Project: Spark Issue Type: New Feature Components: ML, MLlib Reporter: Peter Prettenhofer Original Estimate: 168h Remaining Estimate: 168h Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here [1]. R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). [1] http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)
[ https://issues.apache.org/jira/browse/SPARK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5133: - Priority: Major (was: Minor) Feature Importance for Decision Tree (Ensembles) Key: SPARK-5133 URL: https://issues.apache.org/jira/browse/SPARK-5133 Project: Spark Issue Type: New Feature Components: ML, MLlib Reporter: Peter Prettenhofer Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here [1]. R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). [1] http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)
[ https://issues.apache.org/jira/browse/SPARK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Prettenhofer updated SPARK-5133: -- Description: Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here [1]. R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). [1] http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation was: Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). Feature Importance for Decision Tree (Ensembles) Key: SPARK-5133 URL: https://issues.apache.org/jira/browse/SPARK-5133 Project: Spark Issue Type: New Feature Components: ML, MLlib Reporter: Peter Prettenhofer Priority: Minor Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here [1]. R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). [1] http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)
[ https://issues.apache.org/jira/browse/SPARK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Prettenhofer updated SPARK-5133: -- Summary: Feature Importance for Decision Tree (Ensembles) (was: Feature Importance for Tree (Ensembles)) Feature Importance for Decision Tree (Ensembles) Key: SPARK-5133 URL: https://issues.apache.org/jira/browse/SPARK-5133 Project: Spark Issue Type: New Feature Components: ML, MLlib Reporter: Peter Prettenhofer Priority: Minor Add feature importance to decision tree model and tree ensemble models. If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below: Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature. Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection. All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org