[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837306#comment-15837306 ] Apache Spark commented on SPARK-18710: -- User 'actuaryzhang' has created a pull request for this issue: https://github.com/apache/spark/pull/16699 > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779760#comment-15779760 ] Wayne Zhang commented on SPARK-18710: - Thanks for the comment, Yanbo. In IRLS, the fit method expects RDD[Instance]. Does it still work if one feeds a RDD[GLRInstance] object to it? {code} def fit(instances: RDD[Instance]): IterativelyReweightedLeastSquaresModel = { {code} > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767276#comment-15767276 ] Yanbo Liang commented on SPARK-18710: - {{IterativelyReweightedLeastSquares}} is {{private[ml]}}, we can change {{val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double)}} to {{val reweightFunc: (GLMInstance, WeightedLeastSquaresModel) => (Double, Double)}}. What you should do is to add corresponding docs to explain the meaning of {{offset}}. Thanks. > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765733#comment-15765733 ] Wayne Zhang commented on SPARK-18710: - [~yanboliang] Thanks for the suggestion. I think the issue is a bit different in this case. The IRWLS relies on the _reweightFunc_, which is hard-coded to take an _Instance_ class: {code} val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double) {code} I need to pass the offset column to this reweight function. Creating another GLRInstance won't solve the problem, will it? > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763622#comment-15763622 ] Yanbo Liang commented on SPARK-18710: - We can add a new class for GLR instance(named as {{GLRInstance}} ?) rather than modifying the existing {{Instance}}, actually we have did similar thing when implementing {{AFTSurvivalRegression}}, you can refer {{AFTPoint}}. Thanks. > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762752#comment-15762752 ] Wayne Zhang commented on SPARK-18710: - [~yanboliang] It seems that I would need to change the case class 'Instance' to include offset... That could be potentially disruptive if many other models also depend on this case class. Any suggestions regarding this? > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Labels: features > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18710) Add offset to GeneralizedLinearRegression models
[ https://issues.apache.org/jira/browse/SPARK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748523#comment-15748523 ] Yanbo Liang commented on SPARK-18710: - [~actuaryzhang] This proposal makes sense, please feel free to start the work, I assigned this task to you. BTW, please don't set {{Fix version}} when you open JIRA ticket, it should be set after the task was resolved. Thanks. > Add offset to GeneralizedLinearRegression models > > > Key: SPARK-18710 > URL: https://issues.apache.org/jira/browse/SPARK-18710 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.2 >Reporter: Wayne Zhang > Labels: features > Fix For: 2.2.0 > > Original Estimate: 10h > Remaining Estimate: 10h > > The current GeneralizedLinearRegression model does not support offset. The > offset can be useful to take into account exposure, or for testing > incremental effect of new variables. It is possible to use weights in current > environment to achieve the same effect of specifying offset for certain > models, e.g., Poisson & Binomial with log offset, it is desirable to have the > offset option to work with more general cases, e.g., negative offset or > offset that is hard to specify using weights (e.g., offset to the probability > rather than odds in logistic regression). > Effort would involve: > * update regression class to support offsetCol > * update IWLS to take into account of offset > * add test case for offset > I can start working on this if the community approves this feature. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org