Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/20578#discussion_r167442376 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") ( } private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel = { + val handlePersistence = dataset.storageLevel == StorageLevel.NONE + val data = dataset.select($(itemsCol)) - val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[T](0).toArray) + val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[Any](0).toArray) --- End diff -- An interesting curiosity for me: why FPGrowth contract requires `Array` of items, not `Seq`? First, it's strange for the contract to require a specific implementation rather than an interface. Second, this leads to redundant `toArray` and back `toSeq` transformations. `Seq` would be more convenient, as `Row` class has `getSeq` method but does not have `getArray`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org