Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22236#discussion_r212833878
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -61,6 +61,18 @@ class AssociationRules private[fpm] (
        */
       @Since("1.5.0")
       def run[Item: ClassTag](freqItemsets: RDD[FreqItemset[Item]]): 
RDD[Rule[Item]] = {
    +    run(freqItemsets, Map.empty[Item, Long])
    +  }
    +
    +  /**
    +   * Computes the association rules with confidence above `minConfidence`.
    +   * @param freqItemsets frequent itemset model obtained from [[FPGrowth]]
    +   * @return a `Set[Rule[Item]]` containing the association rules. The 
rules will be able to
    +   *         compute also the lift metric.
    +   */
    +  @Since("2.4.0")
    +  def run[Item: ClassTag](freqItemsets: RDD[FreqItemset[Item]],
    +      itemSupport: Map[Item, Long]): RDD[Rule[Item]] = {
    --- End diff --
    
    So if I understand this correctly, and I may not, FPGrowthModel just holds 
frequent item sets. It's only association rules where the lift computation is 
needed. In the course of computing association rules, you can compute item 
support here. Why does it need to be saved with the model? I can see it might 
be an optimization but also introduces complexity (and compatibility issues?) 
here. It may be pretty fast to compute right here though. You already end up 
with `(..., (consequent, count))` in candidates, from which you can get the 
total consequent counts directly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to