purijatin commented on a change in pull request #23549: [SPARK-26616][MLlib] 
Expose document frequency in IDFModel
URL: https://github.com/apache/spark/pull/23549#discussion_r248901834
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/IDF.scala
 ##########
 @@ -178,10 +188,19 @@ object IDFModel extends MLReadable[IDFModel] {
       val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
       val dataPath = new Path(path, "data").toString
       val data = sparkSession.read.parquet(dataPath)
-      val Row(idf: Vector) = MLUtils.convertVectorColumnsToML(data, "idf")
-        .select("idf")
-        .head()
-      val model = new IDFModel(metadata.uid, new 
feature.IDFModel(OldVectors.fromML(idf)))
+
+      val model = if (majorVersion(metadata.sparkVersion) >= 3) {
+        val Row(idf: Vector, df: Seq[_], numDocs: Long) = data.select("idf", 
"docFreq", "numDocs")
+          .head()
+        new IDFModel(metadata.uid, new feature.IDFModel(OldVectors.fromML(idf),
+          df.asInstanceOf[Seq[Long]].toArray, numDocs))
+      } else {
+        val Row(idf: Vector) = MLUtils.convertVectorColumnsToML(data, "idf")
+          .select("idf")
+          .head()
+        new IDFModel(metadata.uid,
 
 Review comment:
   Yes.
   
   Another problem with returning option is that, people using the class from 
spark 3 onwards, will get an option. And it would be confusing because it may 
mean that docFreq may not be present. But it always would be. We would then 
need to have doc explaining the scenario of None,complicating the doc. 
   
   Secondly it would need to remain consistent with other models. Where all new 
fields added should return option. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to