Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20973#discussion_r185058005
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala ---
    @@ -44,26 +43,37 @@ object PrefixSpan {
        *
        * @param dataset A dataset or a dataframe containing a sequence column 
which is
        *                {{{Seq[Seq[_]]}}} type
    -   * @param sequenceCol the name of the sequence column in dataset
    +   * @param sequenceCol the name of the sequence column in dataset, rows 
with nulls in this column
    +   *                    are ignored
        * @param minSupport the minimal support level of the sequential 
pattern, any pattern that
        *                   appears more than (minSupport * 
size-of-the-dataset) times will be output
    -   *                  (default: `0.1`).
    -   * @param maxPatternLength the maximal length of the sequential pattern, 
any pattern that appears
    -   *                         less than maxPatternLength will be output 
(default: `10`).
    +   *                  (recommended value: `0.1`).
    +   * @param maxPatternLength the maximal length of the sequential pattern
    +   *                         (recommended value: `10`).
        * @param maxLocalProjDBSize The maximum number of items (including 
delimiters used in the
        *                           internal storage format) allowed in a 
projected database before
        *                           local processing. If a projected database 
exceeds this size, another
    -   *                           iteration of distributed prefix growth is 
run (default: `32000000`).
    -   * @return A dataframe that contains columns of sequence and 
corresponding frequency.
    +   *                           iteration of distributed prefix growth is 
run
    +   *                           (recommended value: `32000000`).
    +   * @return A `DataFrame` that contains columns of sequence and 
corresponding frequency.
    +   *         The schema of it will be:
    +   *          - `sequence: Seq[Seq[T]]` (T is the item type)
    +   *          - `frequency: Long`
    --- End diff --
    
    I had asked for this change to "frequency" from "freq," but I belatedly 
realized that this conflicts with the existing FPGrowth API, which uses "freq." 
 It would be best to maintain consistency.  Would you mind reverting to "freq?"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to