[ https://issues.apache.org/jira/browse/SPARK-29756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-29756. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26398 [https://github.com/apache/spark/pull/26398] > CountVectorizer forget to unpersist intermediate rdd > ---------------------------------------------------- > > Key: SPARK-29756 > URL: https://issues.apache.org/jira/browse/SPARK-29756 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Assignee: zhengruifeng > Priority: Trivial > Fix For: 3.0.0 > > > {code:java} > scala> val df = spark.createDataFrame(Seq( > | (0, Array("a", "b", "c")), > | (1, Array("a", "b", "b", "c", "a")) > | )).toDF("id", "words") > df: org.apache.spark.sql.DataFrame = [id: int, words: array<string>]scala> > import org.apache.spark.ml.feature._ > import org.apache.spark.ml.feature._ > scala> val cvModel: CountVectorizerModel = new > CountVectorizer().setInputCol("words").setOutputCol("features").setVocabSize(3).setMinDF(2).fit(df) > cvModel: org.apache.spark.ml.feature.CountVectorizerModel = > cntVec_5edcfe4828c2 > scala> sc.getPersistentRDDs > res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(9 -> > MapPartitionsRDD[9] at map at CountVectorizer.scala:223) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org