Github user tashoyan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20578#discussion_r167441224
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
    @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") (
       }
     
       private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel 
= {
    +    val handlePersistence = dataset.storageLevel == StorageLevel.NONE
    +
         val data = dataset.select($(itemsCol))
    -    val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => 
r.getSeq[T](0).toArray)
    +    val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => 
r.getSeq[Any](0).toArray)
    --- End diff --
    
    It is not only for caching. Same ArrayStoreException occurs if one tries to 
execute collect() on the items RDD. No exception when using a concrete type 
like String instead of T. Probably the latter explains how it worked before - 
people invoked dataset.cache() in their code where type parameter of the 
Dataset is known.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to