[GitHub] purijatin commented on a change in pull request #23549: [SPARK-26616][MLlib] Expose document frequency in IDFModel

GitBox Wed, 16 Jan 2019 00:22:17 -0800

purijatin commented on a change in pull request #23549: [SPARK-26616][MLlib] 
Expose document frequency in IDFModel
URL: https://github.com/apache/spark/pull/23549#discussion_r248186065


 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/IDF.scala
 ##########
 @@ -178,10 +187,10 @@ object IDFModel extends MLReadable[IDFModel] {
       val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
       val dataPath = new Path(path, "data").toString
       val data = sparkSession.read.parquet(dataPath)
-      val Row(idf: Vector) = MLUtils.convertVectorColumnsToML(data, "idf")
-        .select("idf")
-        .head()
-      val model = new IDFModel(metadata.uid, new 
feature.IDFModel(OldVectors.fromML(idf)))
+      val Row(idf: Vector, df: Seq[_], numDocs: Long) =
 
 Review comment:
   I have edited it now to add support for older models. Apologies, missed it 
previously.
   
   Should a test case be added to test for older models? I check PCASuite, 
FPGrowth and KMeanSuite where the models were changed with version. Didn't find 
test to check for older models. Think it is a norm. Can add if needed.
   
   Squashed the commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] purijatin commented on a change in pull request #23549: [SPARK-26616][MLlib] Expose document frequency in IDFModel

Reply via email to