Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20313#discussion_r194636521
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -264,7 +265,9 @@ class CountVectorizerModel(
Vectors.sparse(dictBr.value.size, effectiveCounts)
}
- dataset.withColumn($(outputCol), vectorizer(col($(inputCol))))
+ val attrs = vocabulary.map(_ => new
NumericAttribute).asInstanceOf[Array[Attribute]]
--- End diff --
Sorry for replying late. Though I agree that this attributes don't provide
much info, I'm wondering if we can let it lazily generated. At this point, I
think we don't know if following transformer will need it or not?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]