Re: Sequential Pattern Mining

Ted Dunning Sun, 27 Nov 2011 16:15:08 -0800

There are several good ways to deal with this.  The idea of super-products
which are composite features that are derived from history is a good one.
 I would recommend that you limit the number of such super features by
first finding which products cooccur within a reasonable time window more
than you would expect.

The cooccurrence analysis system in Mahout can be misused for this analysis
by building one document per user per sliding window period.  This is a bit
flawed since the sliding windows overlap and thus the appearances of a
transaction in multiple documents is not really an indication of
independent appearances.  Also, the intermediate window documents are much
larger than you might like and they won't take ordering into account.

A better approach is to adapt the current code.  The basic data you need to
collect are:

- the number of times each product appears in a single users transaction
history before another product.

- the number of times each product appears in a transaction history after
another product

- the number of times product i appears after product j.

You can then use the LLR code in Mahout to find cases where a product
sequence occurs anomalously often.  You can then use a Bloom filter or
similar data structure to analyze histories so that you emit product and
super-products as input to a conventional collaborative filtering analysis.

The second major approach to this problem is to build a separate classifier
for each product of interest.  I wouldn't recommend that if you have lots
of possible products, but this can work very well if you have a reasonably
small number of products (say a few hundred or thousand) that you might be
about to recommend.

On Sun, Nov 27, 2011 at 2:09 AM, Nishant Chandra
<[email protected]>wrote:

> Use case is related to purchase transactions.
>
> Sample data set:
> Customer ID Acquisition time Products
> 101 30 June 2007 Product 1
> 101 12 August 2007 Product 3
> 101 20 December 2008 Product 4
> 102 10 September 2008 Product 3
> 102 12 September 2008 Product 5
> 102 20 January 2009 Product 5.....
>
> Sample rule:
> Rule ID Consequent Antecedents                        Support %
> Confidence %
> Rule 1   Product 4    Product 1 then Product 3        57.1
>  75.0
>
> I want to identify rules such as: after acquiring product 1 and then
> product 3, customers have an increased likelihood
> (75%) of purchasing product 4 next.
>
> Thanks,
> Nishant
>
>
> On Sun, Nov 27, 2011 at 3:27 PM, Paritosh Ranjan <[email protected]>
> wrote:
> > Can you tell something about your use case?
> >
> > Paritosh
> >
> > On 27-11-2011 15:14, Nishant Chandra wrote:
> >>
> >> Hi,
> >>
> >> Is there any implementation for Sequential Pattern Mining in Mahout? I
> >> see there is an implementation of Sequential Pattern Mining but I am
> >> unsure if it can be used for my use case.
> >>
> >> Thanks,
> >> Nishant
> >>
> >>
> >> -----
> >> No virus found in this message.
> >> Checked by AVG - www.avg.com
> >> Version: 10.0.1411 / Virus Database: 2092/4041 - Release Date: 11/26/11
> >
> >
>

Re: Sequential Pattern Mining

Reply via email to