Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/16770#discussion_r173545477
--- Diff: python/pyspark/ml/feature.py ---
@@ -437,33 +498,20 @@ class CountVectorizer(JavaEstimator, HasInputCol,
HasOutputCol, JavaMLReadable,
>>> loadedModel = CountVectorizerModel.load(modelPath)
>>> loadedModel.vocabulary == model.vocabulary
True
+ >>> fromVocabModel =
CountVectorizerModel.from_vocabulary(model.vocabulary,
--- End diff --
This might be better with an explicit manual array rather than
model.vocabulary to show folks how to expect to use it? What are your thoughts?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]