Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22236
  
    @srowen then what about recomputing when reading saved models? This seems a 
good compromise to me as it saves the writing of the data, it allows having 
lift for old models, but it doesn't introduce any perf regression when creating 
a model.
    
    Of course the "terribly expensive" quite depends on how much data there is 
etc.etc. Anyway, it is a pass on the `freqItemset` RDD. As it is not cached, it 
means we have to generate all the possible itemsets and perform an aggregation 
on them. Maybe that is not the most expensive part of the algorithm, but saving 
it seems worth to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to