Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/16770#discussion_r173544594
--- Diff: python/pyspark/ml/tests.py ---
@@ -640,6 +640,23 @@ def test_count_vectorizer_with_binary(self):
feature, expected = r
self.assertEqual(feature, expected)
+ def test_count_vectorizer_from_vocab(self):
+ model = CountVectorizerModel.from_vocabulary(["a", "b", "c"],
inputCol="words",
--- End diff --
Good first test, I'd love to also see it with empty vocab, and also one
that uses the default values.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]