Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20578#discussion_r167439487
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
    @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") (
       }
     
       private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel 
= {
    +    val handlePersistence = dataset.storageLevel == StorageLevel.NONE
    +
         val data = dataset.select($(itemsCol))
    -    val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => 
r.getSeq[T](0).toArray)
    +    val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => 
r.getSeq[Any](0).toArray)
    --- End diff --
    
    Hm, I wonder how it worked before then? maybe it is related to the caching.
    But then this changes the type of `items`, and therefore `parentModel`. 
Maybe it doesn't matter, just surprising.
    
    T is a ClassTag here and `.toArray` wouldn't work otherwise. Small similar 
example:
    
    ```
    scala> def foo[T:scala.reflect.ClassTag](s: Seq[T]) = s.toArray
    foo: [T](s: Seq[T])(implicit evidence$1: scala.reflect.ClassTag[T])Array[T]
    
    scala> def foo[T](s: Seq[T]) = s.toArray
    <console>:11: error: No ClassTag available for T
           def foo[T](s: Seq[T]) = s.toArray
                                     ^
    
    scala> def foo[T:scala.reflect.ClassTag](s: Seq[T]) = s.toArray
    foo: [T](s: Seq[T])(implicit evidence$1: scala.reflect.ClassTag[T])Array[T]
    
    scala> foo(Seq("foo", "bar"))
    res3: Array[String] = Array(foo, bar)
    ```
    
    oh well if it works with this change and not without, seems OK, just an 
interesting curiosity.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to