[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm
[ https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979 ] Yanbo Liang edited comment on SPARK-21919 at 9/7/17 2:02 PM: - [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. I just paste your code into console of {{bin/pyspark}} and got: {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} was (Author: yanboliang): [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} > inconsistent behavior of AFTsurvivalRegression algorithm > > > Key: SPARK-21919 > URL: https://issues.apache.org/jira/browse/SPARK-21919 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.2.0 > Environment: Spark Version: 2.2.0 > Cluster setup: Standalone single node > Python version: 3.5.2 >Reporter: Ashish Chopra > > Took the direct example from spark ml documentation. > {code} > training = spark.createDataFrame([ > (1.218, 1.0, Vectors.dense(1.560, -0.605)), > (2.949, 0.0, Vectors.dense(0.346, 2.158)), > (3.627, 0.0, Vectors.dense(1.380, 0.231)),
[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm
[ https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979 ] Yanbo Liang edited comment on SPARK-21919 at 9/7/17 2:02 PM: - [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. I just pasted your code into {{bin/pyspark}} and got: {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} was (Author: yanboliang): [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. I just paste your code into console of {{bin/pyspark}} and got: {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} > inconsistent behavior of AFTsurvivalRegression algorithm > > > Key: SPARK-21919 > URL: https://issues.apache.org/jira/browse/SPARK-21919 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.2.0 > Environment: Spark Version: 2.2.0 > Cluster setup: Standalone single node > Python version: 3.5.2 >Reporter: Ashish Chopra > > Took the direct example from spark ml documentation. > {code} > training = spark.createDataFrame([ > (1.218, 1.0, Vectors.dense(1.560, -0.605)), > (2.949, 0.0, Vectors.dense(0.346, 2.158)
[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm
[ https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979 ] Yanbo Liang edited comment on SPARK-21919 at 9/7/17 1:59 PM: - [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} was (Author: yanboliang): [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} > inconsistent behavior of AFTsurvivalRegression algorithm > > > Key: SPARK-21919 > URL: https://issues.apache.org/jira/browse/SPARK-21919 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.2.0 > Environment: Spark Version: 2.2.0 > Cluster setup: Standalone single node > Python version: 3.5.2 >Reporter: Ashish Chopra > > Took the direct example from spark ml documentation. > {code} > training = spark.createDataFrame([ > (
[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm
[ https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979 ] Yanbo Liang edited comment on SPARK-21919 at 9/7/17 1:58 PM: - [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209894956245,25.82479316934075] | |24.199|0.0 |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]| +--+--+--+--+---+ {code} was (Author: yanboliang): [~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}. {code} >>> from pyspark.ml.regression import AFTSurvivalRegression >>> from pyspark.ml.linalg import Vectors >>> training = spark.createDataFrame([ ... (21.218, 1.0, Vectors.dense(1.560, -0.605)), ... (22.949, 0.0, Vectors.dense(0.346, 2.158)), ... (23.627, 0.0, Vectors.dense(1.380, 0.231)), ... (20.273, 1.0, Vectors.dense(0.520, 1.151)), ... (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", ... "features"]) >>> quantileProbabilities = [0.3, 0.6] >>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, ... quantilesCol="quantiles") >>> model = aft.fit(training) 17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5 17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.25 17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5 17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.25 17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.125 >>> print("Coefficients: " + str(model.coefficients)) Coefficients: [-0.065814695216,0.00326705958509] >>> print("Intercept: " + str(model.intercept)) Intercept: 3.29140205698 >>> print("Scale: " + str(model.scale)) Scale: 0.109856123692 >>> model.transform(training).show(truncate=False) 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS +--+--+--+--+---+ |label |censor|features |prediction|quantiles | +--+--+--+--+---+ |21.218|1.0 |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] | |22.949|0.0 |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]| |23.627|0.0 |[1.38,0.231] |24.565240805031497|[21.934888406858644,24.330450511651165]| |20.273|1.0 |[0.52,1.151] |26.074003958175602|[23.28209