Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22236
@srowen then what about recomputing when reading saved models? This seems a
good compromise to me as it saves the writing of the data, it allows having
lift for old models, but it doesn't introduce any perf regression when creating
a model.
Of course the "terribly expensive" quite depends on how much data there is
etc.etc. Anyway, it is a pass on the `freqItemset` RDD. As it is not cached, it
means we have to generate all the possible itemsets and perform an aggregation
on them. Maybe that is not the most expensive part of the algorithm, but saving
it seems worth to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]