[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581253#comment-16581253 ] Barry Becker commented on SPARK-9610: - All ML models should support having and optional weighting column set. The weighting column should be a positive real number. If weight values are not >0, then that should throw an error. A weighting column is useful for several cases - like when the class labels are very skewed, or when you just want some records to count more heavily than others. For example, you might want a dataset of cities to be weighted by population, or a dataset of products to be weighted by price. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley >Priority: Major > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219005#comment-16219005 ] Barry Becker commented on SPARK-9610: - Frequent item sets (associations) could use it too. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266269#comment-15266269 ] zhengruifeng commented on SPARK-9610: - [~josephkb] Clustering algorithms may need weighting too > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742064#comment-14742064 ] Nickolay Yakushev commented on SPARK-9610: -- Thanks for reply. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741376#comment-14741376 ] Joseph K. Bradley commented on SPARK-9610: -- I see. I have not seen many use cases where you need to encode different semantics into "weight" like that. For most ML use cases I've seen, weights are handled in a standard manner, regardless of whether they represent a sample count, degree of trust, etc. Let's defer on the semantics for now. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738596#comment-14738596 ] Nickolay Yakushev commented on SPARK-9610: -- Sometimes an algorithm for non-weighted data may be transformed to weighted in more than one way. It may depend on what the weight is. I wish I could give a better example. For example, cardinality of the union of two sets c = |A U B|. * Non-weighted case (identical weight): A = {1}, B = {1}, c = 1 * Weight is the degree of truth: A = {1 -> 0.8}, B = {1 -> 0.5}, c = 0.8 or 1.0 * Weight is quantity: A = {1 -> 0.8}, B = {1 -> 0.5}, c = 1.3 I don't know if there's any difference for the algorithms in the list. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734556#comment-14734556 ] Nickolay Yakushev commented on SPARK-9610: -- 1. Is basic statistics a good candidate for this list? 2. Should we somehow distinguish weight's nature? E.g. fuzzy set or multiset (quantitative). 3. Can weight be negative? > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9610) Class and instance weighting for ML
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735945#comment-14735945 ] Joseph K. Bradley commented on SPARK-9610: -- 1. Basic stats sound reasonable, but that might be supported under DataFrames since they are getting more and more stats functions. [~rxin] any plans to support row weights for DataFrames methods (where the weight would be a Double column in the DataFrame)? 2. Could you please clarify what you mean, and how those types differ? 3. I don't see a need for this, and this would complicate implementations. > Class and instance weighting for ML > --- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Joseph K. Bradley > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org