[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero

2017-01-19 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830037#comment-15830037
 ] 

Yanbo Liang edited comment on SPARK-19234 at 1/19/17 2:36 PM:
--

[~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for failure time with zero value:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}


was (Author: yanboliang):
[~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for zero failure time:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}

> AFTSurvivalRegression chokes silently or with confusing errors when any 
> labels are zero
> ---
>
> Key: SPARK-19234
> URL: https://issues.apache.org/jira/browse/SPARK-19234
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0
> Environment: spark-shell or pyspark
>Reporter: Andrew MacKinlay
>Priority: Minor
> Attachments: spark-aft-failure.txt
>
>
> If you try and use AFTSurvivalRegression and any label in your input data is 
> 0.0, you get coefficients of 0.0 returned, and in many cases, errors like 
> this:
> {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
> function evaluation. Decreasing step size to NaN}}
> Zero should, I think, be an allowed value for survival analysis. I don't know 
> if this is a pathological case for AFT specifically as I don't know enough 
> about it, but this behaviour is clearly undesirable. If you have any labels 
> of 0.0, you get either a) obscure error messages, with no knowledge of the 
> cause and coefficients which are all zero or b) no errors messages at all and 
> coefficients of zero (arguably worse, since you don't even have console 
> output to tell you something's gone awry). If AFT doesn't work with 
> zero-valued labels, Spark should fail fast and let the developer know why. If 
> it does, we should get results here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero

2017-01-19 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830037#comment-15830037
 ] 

Yanbo Liang edited comment on SPARK-19234 at 1/19/17 2:33 PM:
--

[~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for zero failure time:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}


was (Author: yanboliang):
[~admackin] Nice catch. Just like [~srowen] side, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for zero failure time:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}

> AFTSurvivalRegression chokes silently or with confusing errors when any 
> labels are zero
> ---
>
> Key: SPARK-19234
> URL: https://issues.apache.org/jira/browse/SPARK-19234
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0
> Environment: spark-shell or pyspark
>Reporter: Andrew MacKinlay
>Priority: Minor
> Attachments: spark-aft-failure.txt
>
>
> If you try and use AFTSurvivalRegression and any label in your input data is 
> 0.0, you get coefficients of 0.0 returned, and in many cases, errors like 
> this:
> {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
> function evaluation. Decreasing step size to NaN}}
> Zero should, I think, be an allowed value for survival analysis. I don't know 
> if this is a pathological case for AFT specifically as I don't know enough 
> about it, but this behaviour is clearly undesirable. If you have any labels 
> of 0.0, you get either a) obscure error messages, with no knowledge of the 
> cause and coefficients which are all zero or b) no errors messages at all and 
> coefficients of zero (arguably worse, since you don't even have console 
> output to tell you something's gone awry). If AFT doesn't work with 
> zero-valued labels, Spark should fail fast and let the developer know why. If 
> it does, we should get results here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19234) AFTSurvivalRegression chokes silently or with confusing errors when any labels are zero

2017-01-17 Thread Andrew MacKinlay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827236#comment-15827236
 ] 

Andrew MacKinlay edited comment on SPARK-19234 at 1/18/17 1:53 AM:
---

[~yanboliang] I presume that you are the author judging by Github commits? Do 
you have an opinion on this? 


was (Author: admackin):
[~yanbo] I presume that you are the author judging by Github commits? Do you 
have an opinion on this? 

> AFTSurvivalRegression chokes silently or with confusing errors when any 
> labels are zero
> ---
>
> Key: SPARK-19234
> URL: https://issues.apache.org/jira/browse/SPARK-19234
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0
> Environment: spark-shell or pyspark
>Reporter: Andrew MacKinlay
>Priority: Minor
> Attachments: spark-aft-failure.txt
>
>
> If you try and use AFTSurvivalRegression and any label in your input data is 
> 0.0, you get coefficients of 0.0 returned, and in many cases, errors like 
> this:
> {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
> function evaluation. Decreasing step size to NaN}}
> Zero should, I think, be an allowed value for survival analysis. I don't know 
> if this is a pathological case for AFT specifically as I don't know enough 
> about it, but this behaviour is clearly undesirable. If you have any labels 
> of 0.0, you get either a) obscure error messages, with no knowledge of the 
> cause and coefficients which are all zero or b) no errors messages at all and 
> coefficients of zero (arguably worse, since you don't even have console 
> output to tell you something's gone awry). If AFT doesn't work with 
> zero-valued labels, Spark should fail fast and let the developer know why. If 
> it does, we should get results here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org