[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

2017-09-07 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979
 ] 

Yanbo Liang edited comment on SPARK-21919 at 9/7/17 2:02 PM:
-

[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}. I just paste your code into 
console of {{bin/pyspark}} and got:
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}


was (Author: yanboliang):
[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}

> inconsistent behavior of AFTsurvivalRegression algorithm
> 
>
> Key: SPARK-21919
> URL: https://issues.apache.org/jira/browse/SPARK-21919
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.2.0
> Environment: Spark Version: 2.2.0
> Cluster setup: Standalone single node
> Python version: 3.5.2
>Reporter: Ashish Chopra
>
> Took the direct example from spark ml documentation.
> {code}
> training = spark.createDataFrame([
> (1.218, 1.0, Vectors.dense(1.560, -0.605)),
> (2.949, 0.0, Vectors.dense(0.346, 2.158)),
> (3.627, 0.0, Vectors.dense(1.380, 0.231)),

[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

2017-09-07 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979
 ] 

Yanbo Liang edited comment on SPARK-21919 at 9/7/17 2:02 PM:
-

[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}. I just pasted your code into 
{{bin/pyspark}} and got:
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}


was (Author: yanboliang):
[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}. I just paste your code into 
console of {{bin/pyspark}} and got:
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}

> inconsistent behavior of AFTsurvivalRegression algorithm
> 
>
> Key: SPARK-21919
> URL: https://issues.apache.org/jira/browse/SPARK-21919
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.2.0
> Environment: Spark Version: 2.2.0
> Cluster setup: Standalone single node
> Python version: 3.5.2
>Reporter: Ashish Chopra
>
> Took the direct example from spark ml documentation.
> {code}
> training = spark.createDataFrame([
> (1.218, 1.0, Vectors.dense(1.560, -0.605)),
> (2.949, 0.0, Vectors.dense(0.346, 2.158)

[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

2017-09-07 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979
 ] 

Yanbo Liang edited comment on SPARK-21919 at 9/7/17 1:59 PM:
-

[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}


was (Author: yanboliang):
[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}

> inconsistent behavior of AFTsurvivalRegression algorithm
> 
>
> Key: SPARK-21919
> URL: https://issues.apache.org/jira/browse/SPARK-21919
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.2.0
> Environment: Spark Version: 2.2.0
> Cluster setup: Standalone single node
> Python version: 3.5.2
>Reporter: Ashish Chopra
>
> Took the direct example from spark ml documentation.
> {code}
> training = spark.createDataFrame([
> (

[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

2017-09-07 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979
 ] 

Yanbo Liang edited comment on SPARK-21919 at 9/7/17 1:58 PM:
-

[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   
|[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+--+--+--+--+---+
{code}


was (Author: yanboliang):
[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct 
result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
... (21.218, 1.0, Vectors.dense(1.560, -0.605)),
... (22.949, 0.0, Vectors.dense(0.346, 2.158)),
... (23.627, 0.0, Vectors.dense(1.380, 0.231)),
... (20.273, 1.0, Vectors.dense(0.520, 1.151)),
... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
... "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...  quantilesCol="quantiles")
>>> model = aft.fit(training)
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to 0.5
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to 0.25
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to 0.5
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to 0.25
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to 0.125
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS
+--+--+--+--+---+
|label |censor|features  |prediction|quantiles  
|
+--+--+--+--+---+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 
|[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] 
|26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  
|24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  
|26.074003958175602|[23.28209