MaxGekk edited a comment on issue #26454: [SPARK-29818][MLLIB] Missing persist on RDD URL: https://github.com/apache/spark/pull/26454#issuecomment-552135269 Persisting intermediate results is not always good because serialization has some costs + some vendors can solve the performance issue by another ways like [IO caching](https://docs.databricks.com/delta/optimizations/delta-cache.html#optimize-performance-with-caching) /cc @gatorsmile @cloud-fan @amanomer Just in case, do you observe performance improvements on some benchmarks or use cases?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
