[GitHub] [spark] amanomer commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD

GitBox Sun, 10 Nov 2019 01:23:07 -0800

amanomer commented on a change in pull request #26454: [SPARK-29818][MLLIB] 
Missing persist on RDD
URL: https://github.com/apache/spark/pull/26454#discussion_r344481010


 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala
 ##########
 @@ -141,8 +141,10 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
       Some(Array.fill($(numFolds))(Array.fill[Model[_]](epm.length)(null)))
     } else None
 
+    val inputRDD = dataset.toDF.rdd
+    inputRDD.persist()
 
 Review comment:
   > Persisting intermediate results is not always good
   
   Kind request, Can you explain a case for this? and can this be solved by 
changing the storage level?
   A case when persisting intermediate result would be inefficient is, when 
dataset is larger than memory.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amanomer commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD

Reply via email to