[ 
https://issues.apache.org/jira/browse/SPARK-29832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975799#comment-16975799
 ] 

Sean R. Owen commented on SPARK-29832:
--------------------------------------

[~spark_cachecheck] some of these may be valid, but a lot of them don't appear 
to be. Let's tackle a few before opening the 30 JIRAs you did -- that's very 
noisy. 

run() is going to do use (a transform of) this input many times in a loop. I 
don't think this analysis is accurate. You could say that it's more optimal to 
persist stuff closer to where the loop is; sometimes it's better, sometimes, 
not, depends on how big and expensive the result is.

> Unnecessary persist on instances in ml.regression.IsotonicRegression.fit
> ------------------------------------------------------------------------
>
>                 Key: SPARK-29832
>                 URL: https://issues.apache.org/jira/browse/SPARK-29832
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.0.0
>            Reporter: Dong Wang
>            Priority: Major
>
> Persist on instances in ml.regression.IsotonicRegression.fit() is 
> unnecessary, because it is only used once in run(instances).
> {code:scala}
>   override def fit(dataset: Dataset[_]): IsotonicRegressionModel = 
> instrumented { instr =>
>     transformSchema(dataset.schema, logging = true)
>     // Extract columns from data.  If dataset is persisted, do not persist 
> oldDataset.
>     val instances = extractWeightedLabeledPoints(dataset)
>     val handlePersistence = dataset.storageLevel == StorageLevel.NONE
>     // Unnecessary persist
>     if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
>     instr.logPipelineStage(this)
>     instr.logDataset(dataset)
>     instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, 
> featureIndex, isotonic)
>     instr.logNumFeatures(1)
>     val isotonicRegression = new 
> MLlibIsotonicRegression().setIsotonic($(isotonic))
>     val oldModel = isotonicRegression.run(instances) // Only use once here
>     if (handlePersistence) instances.unpersist()
> {code}
> This issue is reported by our tool CacheCheck, which is used to dynamically 
> detecting persist()/unpersist() api misuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to