Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19659#discussion_r149479380
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/NGram.scala ---
@@ -42,11 +42,22 @@ class NGram @Since("1.5.0") (@Since("1.5.0") override
val uid: String)
/**
* Minimum n-gram length, greater than or equal to 1.
+ * All values of m such that n <= m <= maxN will be used.
* Default: 2, bigram features
* @group param
*/
@Since("1.5.0")
- val n: IntParam = new IntParam(this, "n", "number elements per n-gram
(>=1)",
+ val n: IntParam = new IntParam(this, "n", "minimum number of elements
per n-gram (>=1)",
+ ParamValidators.gtEq(1))
+
+ /**
+ * Maximum n-gram length, greater than or equal to `n`.
+ * All values of m such that n <= m <= maxN will be used.
+ * Default: 2, bigram features
+ * @group param
+ */
+ @Since("From which version?")
+ val maxN: IntParam = new IntParam(this, "maxN", "maximum number elements
per n-gram (>=n)",
--- End diff --
Should this have a default value? Also perhaps the explanation that if
unset maxN = n
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]