Re: Spark FP-Growth algorithm for frequent sequential patterns

2015-06-28 Thread Xiangrui Meng
Hi Ping,

FYI, we just merged Feynman's PR:
https://github.com/apache/spark/pull/6997 that adds sequential pattern
support. Please check out master branch and help test. Thanks!

Best,
Xiangrui

On Wed, Jun 24, 2015 at 2:16 PM, Feynman Liang  wrote:
> There is a JIRA for this which I just submitted a PR for :)
>
> On Tue, Jun 23, 2015 at 6:09 PM, Xiangrui Meng  wrote:
>>
>> This is on the wish list for Spark 1.5. Assuming that the items from
>> the same transaction are distinct. We can still follow FP-Growth's
>> steps:
>>
>> 1. find frequent items
>> 2. filter transactions and keep only frequent items
>> 3. do NOT order by frequency
>> 4. use suffix to partition the transactions (whether to use prefix or
>> suffix doesn't really matter in this case)
>> 5. grow FP-tree locally on each partition (the data structure should
>> be the same)
>> 6. generate frequent sub-sequences
>>
>> +Feynman
>>
>> Best,
>> Xiangrui
>>
>> On Fri, Jun 19, 2015 at 10:51 AM, ping yan  wrote:
>> > Hi,
>> >
>> > I have a use case where I'd like to mine frequent sequential patterns
>> > (consider the clickpath scenario). Transaction A -> B doesn't equal
>> > Transaction B->A..
>> >
>> > From what I understand about FP-growth in general and the MLlib
>> > implementation of it, the orders are not preserved. Anyone can provide
>> > some
>> > insights or ideas in extending the algorithm to solve frequent
>> > sequential
>> > pattern mining problems?
>> >
>> > Thanks as always.
>> >
>> >
>> > Ping
>> >
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark FP-Growth algorithm for frequent sequential patterns

2015-06-23 Thread Xiangrui Meng
This is on the wish list for Spark 1.5. Assuming that the items from
the same transaction are distinct. We can still follow FP-Growth's
steps:

1. find frequent items
2. filter transactions and keep only frequent items
3. do NOT order by frequency
4. use suffix to partition the transactions (whether to use prefix or
suffix doesn't really matter in this case)
5. grow FP-tree locally on each partition (the data structure should
be the same)
6. generate frequent sub-sequences

+Feynman

Best,
Xiangrui

On Fri, Jun 19, 2015 at 10:51 AM, ping yan  wrote:
> Hi,
>
> I have a use case where I'd like to mine frequent sequential patterns
> (consider the clickpath scenario). Transaction A -> B doesn't equal
> Transaction B->A..
>
> From what I understand about FP-growth in general and the MLlib
> implementation of it, the orders are not preserved. Anyone can provide some
> insights or ideas in extending the algorithm to solve frequent sequential
> pattern mining problems?
>
> Thanks as always.
>
>
> Ping
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark FP-Growth algorithm for frequent sequential patterns

2015-06-19 Thread ping yan
Hi,

I have a use case where I'd like to mine frequent sequential patterns
(consider the clickpath scenario). Transaction A -> B doesn't equal
Transaction B->A..

>From what I understand about FP-growth in general and the MLlib
implementation of it, the orders are not preserved. Anyone can provide some
insights or ideas in extending the algorithm to solve frequent sequential
pattern mining problems?

Thanks as always.


Ping