Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22236 @srowen then what about recomputing when reading saved models? This seems a good compromise to me as it saves the writing of the data, it allows having lift for old models, but it doesn't introduce any perf regression when creating a model. Of course the "terribly expensive" quite depends on how much data there is etc.etc. Anyway, it is a pass on the `freqItemset` RDD. As it is not cached, it means we have to generate all the possible itemsets and perform an aggregation on them. Maybe that is not the most expensive part of the algorithm, but saving it seems worth to me.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org