[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359969#comment-16359969 ] Apache Spark commented on SPARK-23318: -- User 'tashoyan' has created a pull request for this issue: https://github.com/apache/spark/pull/20578 > FP-growth: WARN FPGrowth: Input data is not cached > -- > > Key: SPARK-23318 > URL: https://issues.apache.org/jira/browse/SPARK-23318 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.2.1 >Reporter: Arseniy Tashoyan >Priority: Minor > Labels: MLLib,, fp-growth > Original Estimate: 24h > Remaining Estimate: 24h > > When running FPGrowth.fit() from _ml_ package, one can see a warning: > WARN FPGrowth: Input data is not cached. > This warning occurs even the dataset of transactions is cached. > Actually this warning comes from the FPGrowth implementation in old _mllib_ > package. New FPGrowth performs some transformations on the input data set of > transactions and then passes it to the old FPGrowth - without caching. Hence > the warning. > The problem looks similar to SPARK-18356 > If you don't mind, I can push a similar fix: > {code} > // ml.FPGrowth > val handlePersistence = dataset.storageLevel == StorageLevel.NONE > if (handlePersistence) { > // cache the data > } > // then call mllib.FPGrowth > // finally unpersist the data > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357482#comment-16357482 ] Arseniy Tashoyan commented on SPARK-23318: -- I want. But I'm short on time now. Will do. > FP-growth: WARN FPGrowth: Input data is not cached > -- > > Key: SPARK-23318 > URL: https://issues.apache.org/jira/browse/SPARK-23318 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.2.1 >Reporter: Arseniy Tashoyan >Priority: Minor > Labels: MLLib,, fp-growth > Original Estimate: 24h > Remaining Estimate: 24h > > When running FPGrowth.fit() from _ml_ package, one can see a warning: > WARN FPGrowth: Input data is not cached. > This warning occurs even the dataset of transactions is cached. > Actually this warning comes from the FPGrowth implementation in old _mllib_ > package. New FPGrowth performs some transformations on the input data set of > transactions and then passes it to the old FPGrowth - without caching. Hence > the warning. > The problem looks similar to SPARK-18356 > If you don't mind, I can push a similar fix: > {code} > // ml.FPGrowth > val handlePersistence = dataset.storageLevel == StorageLevel.NONE > if (handlePersistence) { > // cache the data > } > // then call mllib.FPGrowth > // finally unpersist the data > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356111#comment-16356111 ] Sean Owen commented on SPARK-23318: --- [~tashoyan] did you want to submit a PR for this? > FP-growth: WARN FPGrowth: Input data is not cached > -- > > Key: SPARK-23318 > URL: https://issues.apache.org/jira/browse/SPARK-23318 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.2.1 >Reporter: Arseniy Tashoyan >Priority: Minor > Labels: MLLib,, fp-growth > Original Estimate: 24h > Remaining Estimate: 24h > > When running FPGrowth.fit() from _ml_ package, one can see a warning: > WARN FPGrowth: Input data is not cached. > This warning occurs even the dataset of transactions is cached. > Actually this warning comes from the FPGrowth implementation in old _mllib_ > package. New FPGrowth performs some transformations on the input data set of > transactions and then passes it to the old FPGrowth - without caching. Hence > the warning. > The problem looks similar to SPARK-18356 > If you don't mind, I can push a similar fix: > {code} > // ml.FPGrowth > val handlePersistence = dataset.storageLevel == StorageLevel.NONE > if (handlePersistence) { > // cache the data > } > // then call mllib.FPGrowth > // finally unpersist the data > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350309#comment-16350309 ] Sean Owen commented on SPARK-23318: --- Yes, a similar change sounds fine. > FP-growth: WARN FPGrowth: Input data is not cached > -- > > Key: SPARK-23318 > URL: https://issues.apache.org/jira/browse/SPARK-23318 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.2.1 >Reporter: Arseniy Tashoyan >Priority: Minor > Labels: MLLib,, fp-growth > Original Estimate: 24h > Remaining Estimate: 24h > > When running FPGrowth.fit() from _ml_ package, one can see a warning: > WARN FPGrowth: Input data is not cached. > This warning occurs even the dataset of transactions is cached. > Actually this warning comes from the FPGrowth implementation in old _mllib_ > package. New FPGrowth performs some transformations on the input data set of > transactions and then passes it to the old FPGrowth - without caching. Hence > the warning. > The problem looks similar to SPARK-18356 > If you don't mind, I can push a similar fix: > {code} > // ml.FPGrowth > val handlePersistence = dataset.storageLevel == StorageLevel.NONE > if (handlePersistence) { > // cache the data > } > // then call mllib.FPGrowth > // finally unpersist the data > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org