amanomer commented on issue #26454: [SPARK-29818][MLLIB] Missing persist on RDD URL: https://github.com/apache/spark/pull/26454#issuecomment-552176930 I have not tested this patch on any performance benchmark but I think these functions are quite generic, most of the applications/vendors must be using them. So it would be better if we optimize them like we are doing in other places? https://github.com/apache/spark/blob/57b954e825970f004895ac127083da67e10c09fb/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L155-L158 https://github.com/apache/spark/blob/57b954e825970f004895ac127083da67e10c09fb/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L227 @MaxGekk Kindly correct me if I am wrong. Thanks
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
