[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-20 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-647007802


   Thank you all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-19 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-646725310


   Right. They are all sealed or private. No public API changes. @srowen 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-19 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-646701831


   @srowen Any more comments? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-17 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-645775168


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-17 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-645774417


   I moved ```asBinary``` back to ```LogisticRegressionSummary``` to get rid of 
this
   ```
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.asBinary")
   ```
   All the rest of the MiMa problems are InheritedNewAbstractMethodProblem. I 
think those are OK. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-16 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-644862497


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-16 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-644837229


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-16 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-644827487


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-15 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-644526286


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-15 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643945627


   retest this please
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643896578


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-11 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-642907472


   @zhengruifeng Thanks for your comments.
   I think the traits should be like this (not including weightCol etc. for 
simplicity):
   ```
   trait ClassificationSummary {
 def predictionCol: String
 def labelCol: String
 val multiclassMetrics = 
 new MulticlassMetrics(
predictions.select(col(predictionCol), 
col(labelCol).cast(DoubleType))
   . . . . . .
)
   
   trait BinaryClassificationSummary extends ClassificationSummary
 def rawPredictionCol: String
 val binaryMetrics = 
   new BinaryClassificationMetrics(
 predictions.select(col(rawPredictionCol), 
col(labelCol).cast(DoubleType))
 . . . . . . 
   )
   ```
   However, currently BinaryLogisticRegressionSummary uses probabilityCol for 
BinaryClassificationMetrics and this probabilityCol is in 
LogisticRegressionSummary instead of BinaryLogisticRegressionSummary. In order 
not to break the existing code, I need to make several changes for the above 
traits
   1. change rawPredictionCol to scoreCol
   can't use rawPredictionCol since currently LogisticRegression uses  
probabilityCol
   can't use probabilityCol since LinearSVC doesn't have probabilityCol 
   2. put scoreCol in ClassificationSummary (since currently probabilityCol is 
in LogisticRegressionSummary instead of BinaryLogisticRegressionSummary)
   that's how I get the current traits as following:
   ```
   trait ClassificationSummary {
 def scoreCol: String
 def predictionCol: String
 def labelCol: String
 val multiclassMetrics = 
 new MulticlassMetrics(
predictions.select(col(predictionCol), 
col(labelCol).cast(DoubleType))
   . . . . . .
)
   
   trait BinaryClassificationSummary extends ClassificationSummary
 val binaryMetrics = 
   new BinaryClassificationMetrics(
 predictions.select(col(scoreCol), col(labelCol).cast(DoubleType))
 . . . . . . 
   )
   ```
   To implement summary for other classifiers:
   ```
   LinearSVCSummary extends BinaryClassificationSummary  // use 
rawPredicationCol
   FMClassifierSummary extends BinaryClassificationSummary  // use 
ProbabilityCol
   ```
   For RandomForestClassifer (also for DecisionTreeClassifier and GBTClassiifer)
   ```
   RandomForestSummary extends ClassificationSummary
   BinaryRandomForestSummary extends BinaryClassificationSummary  // use 
ProbabilityCol
   if (numOfClass == 2)
  summary = BinaryRandomForestSummary
   else
  sumary = RandomForestSummary
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-03 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-638460829


   also cc @zhengruifeng 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-03 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-638460384


   I put probabilityCol back in the subclass and this should fix the test 
failure I had yesterday :)
   For the multiclass classification or binary classification metrics we need, 
seems to me that either one of probability or raw prediction is good enough. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org