GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/20578
[SPARK-23318][ML] FP-growth: WARN FPGrowth: Input data is not cached ## What changes were proposed in this pull request? Cache the RDD of items in ml.FPGrowth before passing it to mllib.FPGrowth. Cache only when the user did not cache the input dataset of transactions. This fixes the warning about uncached data emerging from mllib.FPGrowth. ## How was this patch tested? Manually: 1. Run ml.FPGrowthExample - warning is there 2. Apply the fix 3. Run ml.FPGrowthExample again - no warning anymore You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-23318 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20578.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20578 ---- commit d17d3fbee84fcb0072d3030f3118ca18ce783e0c Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-10T21:16:51Z [SPARK-23318][ML]Workaround for 'ArrayStoreException: [Ljava.lang.Object' when trying to cache the RDD of items. commit e0eb8519bf09db12f5d5bc426eaf17d6488e05c1 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-11T15:21:39Z [SPARK-23318][ML] Cache the RDD of items if the user did not cache the input dataset of transactions. This should eliminate the warning about uncahed data in mllib.FPGrowth. commit 374a49c2bf447f3ddfed655f6eda9c8cd5f45285 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-11T15:23:58Z Merge remote-tracking branch 'upstream/master' into SPARK-23318 ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org