Re: Spark FP-Growth algorithm for frequent sequential patterns
Hi Ping, FYI, we just merged Feynman's PR: https://github.com/apache/spark/pull/6997 that adds sequential pattern support. Please check out master branch and help test. Thanks! Best, Xiangrui On Wed, Jun 24, 2015 at 2:16 PM, Feynman Liang wrote: > There is a JIRA for this which I just submitted a PR for :) > > On Tue, Jun 23, 2015 at 6:09 PM, Xiangrui Meng wrote: >> >> This is on the wish list for Spark 1.5. Assuming that the items from >> the same transaction are distinct. We can still follow FP-Growth's >> steps: >> >> 1. find frequent items >> 2. filter transactions and keep only frequent items >> 3. do NOT order by frequency >> 4. use suffix to partition the transactions (whether to use prefix or >> suffix doesn't really matter in this case) >> 5. grow FP-tree locally on each partition (the data structure should >> be the same) >> 6. generate frequent sub-sequences >> >> +Feynman >> >> Best, >> Xiangrui >> >> On Fri, Jun 19, 2015 at 10:51 AM, ping yan wrote: >> > Hi, >> > >> > I have a use case where I'd like to mine frequent sequential patterns >> > (consider the clickpath scenario). Transaction A -> B doesn't equal >> > Transaction B->A.. >> > >> > From what I understand about FP-growth in general and the MLlib >> > implementation of it, the orders are not preserved. Anyone can provide >> > some >> > insights or ideas in extending the algorithm to solve frequent >> > sequential >> > pattern mining problems? >> > >> > Thanks as always. >> > >> > >> > Ping >> > > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark FP-Growth algorithm for frequent sequential patterns
This is on the wish list for Spark 1.5. Assuming that the items from the same transaction are distinct. We can still follow FP-Growth's steps: 1. find frequent items 2. filter transactions and keep only frequent items 3. do NOT order by frequency 4. use suffix to partition the transactions (whether to use prefix or suffix doesn't really matter in this case) 5. grow FP-tree locally on each partition (the data structure should be the same) 6. generate frequent sub-sequences +Feynman Best, Xiangrui On Fri, Jun 19, 2015 at 10:51 AM, ping yan wrote: > Hi, > > I have a use case where I'd like to mine frequent sequential patterns > (consider the clickpath scenario). Transaction A -> B doesn't equal > Transaction B->A.. > > From what I understand about FP-growth in general and the MLlib > implementation of it, the orders are not preserved. Anyone can provide some > insights or ideas in extending the algorithm to solve frequent sequential > pattern mining problems? > > Thanks as always. > > > Ping > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark FP-Growth algorithm for frequent sequential patterns
Hi, I have a use case where I'd like to mine frequent sequential patterns (consider the clickpath scenario). Transaction A -> B doesn't equal Transaction B->A.. >From what I understand about FP-growth in general and the MLlib implementation of it, the orders are not preserved. Anyone can provide some insights or ideas in extending the algorithm to solve frequent sequential pattern mining problems? Thanks as always. Ping