[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero
[ https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830037#comment-15830037 ] Yanbo Liang edited comment on SPARK-19234 at 1/19/17 2:36 PM: -- [~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is regression the log of failure time, so it's invalid when failure time is zero. I think the correct fix should be throwing error for non-positive failure time. Double check with R, it throw error for failure time with zero value: {code} data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 0.231, 1.151, -0.226)) model <- survreg(Surv(time, censor) ~ a + b, data) Error in survreg(Surv(time, censor) ~ a + b, data) : Invalid survival times for this distribution {code} was (Author: yanboliang): [~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is regression the log of failure time, so it's invalid when failure time is zero. I think the correct fix should be throwing error for non-positive failure time. Double check with R, it throw error for zero failure time: {code} data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 0.231, 1.151, -0.226)) model <- survreg(Surv(time, censor) ~ a + b, data) Error in survreg(Surv(time, censor) ~ a + b, data) : Invalid survival times for this distribution {code} > AFTSurvivalRegression chokes silently or with confusing errors when any > labels are zero > --- > > Key: SPARK-19234 > URL: https://issues.apache.org/jira/browse/SPARK-19234 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 > Environment: spark-shell or pyspark >Reporter: Andrew MacKinlay >Priority: Minor > Attachments: spark-aft-failure.txt > > > If you try and use AFTSurvivalRegression and any label in your input data is > 0.0, you get coefficients of 0.0 returned, and in many cases, errors like > this: > {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in > function evaluation. Decreasing step size to NaN}} > Zero should, I think, be an allowed value for survival analysis. I don't know > if this is a pathological case for AFT specifically as I don't know enough > about it, but this behaviour is clearly undesirable. If you have any labels > of 0.0, you get either a) obscure error messages, with no knowledge of the > cause and coefficients which are all zero or b) no errors messages at all and > coefficients of zero (arguably worse, since you don't even have console > output to tell you something's gone awry). If AFT doesn't work with > zero-valued labels, Spark should fail fast and let the developer know why. If > it does, we should get results here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero
[ https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830037#comment-15830037 ] Yanbo Liang edited comment on SPARK-19234 at 1/19/17 2:33 PM: -- [~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is regression the log of failure time, so it's invalid when failure time is zero. I think the correct fix should be throwing error for non-positive failure time. Double check with R, it throw error for zero failure time: {code} data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 0.231, 1.151, -0.226)) model <- survreg(Surv(time, censor) ~ a + b, data) Error in survreg(Surv(time, censor) ~ a + b, data) : Invalid survival times for this distribution {code} was (Author: yanboliang): [~admackin] Nice catch. Just like [~srowen] side, the AFT survival model is regression the log of failure time, so it's invalid when failure time is zero. I think the correct fix should be throwing error for non-positive failure time. Double check with R, it throw error for zero failure time: {code} data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 0.231, 1.151, -0.226)) model <- survreg(Surv(time, censor) ~ a + b, data) Error in survreg(Surv(time, censor) ~ a + b, data) : Invalid survival times for this distribution {code} > AFTSurvivalRegression chokes silently or with confusing errors when any > labels are zero > --- > > Key: SPARK-19234 > URL: https://issues.apache.org/jira/browse/SPARK-19234 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 > Environment: spark-shell or pyspark >Reporter: Andrew MacKinlay >Priority: Minor > Attachments: spark-aft-failure.txt > > > If you try and use AFTSurvivalRegression and any label in your input data is > 0.0, you get coefficients of 0.0 returned, and in many cases, errors like > this: > {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in > function evaluation. Decreasing step size to NaN}} > Zero should, I think, be an allowed value for survival analysis. I don't know > if this is a pathological case for AFT specifically as I don't know enough > about it, but this behaviour is clearly undesirable. If you have any labels > of 0.0, you get either a) obscure error messages, with no knowledge of the > cause and coefficients which are all zero or b) no errors messages at all and > coefficients of zero (arguably worse, since you don't even have console > output to tell you something's gone awry). If AFT doesn't work with > zero-valued labels, Spark should fail fast and let the developer know why. If > it does, we should get results here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero
[ https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827236#comment-15827236 ] Andrew MacKinlay edited comment on SPARK-19234 at 1/18/17 1:53 AM: --- [~yanboliang] I presume that you are the author judging by Github commits? Do you have an opinion on this? was (Author: admackin): [~yanbo] I presume that you are the author judging by Github commits? Do you have an opinion on this? > AFTSurvivalRegression chokes silently or with confusing errors when any > labels are zero > --- > > Key: SPARK-19234 > URL: https://issues.apache.org/jira/browse/SPARK-19234 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 > Environment: spark-shell or pyspark >Reporter: Andrew MacKinlay >Priority: Minor > Attachments: spark-aft-failure.txt > > > If you try and use AFTSurvivalRegression and any label in your input data is > 0.0, you get coefficients of 0.0 returned, and in many cases, errors like > this: > {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in > function evaluation. Decreasing step size to NaN}} > Zero should, I think, be an allowed value for survival analysis. I don't know > if this is a pathological case for AFT specifically as I don't know enough > about it, but this behaviour is clearly undesirable. If you have any labels > of 0.0, you get either a) obscure error messages, with no knowledge of the > cause and coefficients which are all zero or b) no errors messages at all and > coefficients of zero (arguably worse, since you don't even have console > output to tell you something's gone awry). If AFT doesn't work with > zero-valued labels, Spark should fail fast and let the developer know why. If > it does, we should get results here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org