Re: Spark FP-Growth algorithm for frequent sequential patterns

Xiangrui Meng Tue, 23 Jun 2015 18:09:49 -0700

This is on the wish list for Spark 1.5. Assuming that the items from
the same transaction are distinct. We can still follow FP-Growth's
steps:

1. find frequent items
2. filter transactions and keep only frequent items
3. do NOT order by frequency
4. use suffix to partition the transactions (whether to use prefix or
suffix doesn't really matter in this case)
5. grow FP-tree locally on each partition (the data structure should
be the same)
6. generate frequent sub-sequences

+Feynman

Best,
Xiangrui

On Fri, Jun 19, 2015 at 10:51 AM, ping yan <sharon...@gmail.com> wrote:
> Hi,
>
> I have a use case where I'd like to mine frequent sequential patterns
> (consider the clickpath scenario). Transaction A -> B doesn't equal
> Transaction B->A..
>
> From what I understand about FP-growth in general and the MLlib
> implementation of it, the orders are not preserved. Anyone can provide some
> insights or ideas in extending the algorithm to solve frequent sequential
> pattern mining problems?
>
> Thanks as always.
>
>
> Ping
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark FP-Growth algorithm for frequent sequential patterns

Reply via email to