There are several good ways to deal with this. The idea of super-products which are composite features that are derived from history is a good one. I would recommend that you limit the number of such super features by first finding which products cooccur within a reasonable time window more than you would expect.
The cooccurrence analysis system in Mahout can be misused for this analysis by building one document per user per sliding window period. This is a bit flawed since the sliding windows overlap and thus the appearances of a transaction in multiple documents is not really an indication of independent appearances. Also, the intermediate window documents are much larger than you might like and they won't take ordering into account. A better approach is to adapt the current code. The basic data you need to collect are: - the number of times each product appears in a single users transaction history before another product. - the number of times each product appears in a transaction history after another product - the number of times product i appears after product j. You can then use the LLR code in Mahout to find cases where a product sequence occurs anomalously often. You can then use a Bloom filter or similar data structure to analyze histories so that you emit product and super-products as input to a conventional collaborative filtering analysis. The second major approach to this problem is to build a separate classifier for each product of interest. I wouldn't recommend that if you have lots of possible products, but this can work very well if you have a reasonably small number of products (say a few hundred or thousand) that you might be about to recommend. On Sun, Nov 27, 2011 at 2:09 AM, Nishant Chandra <[email protected]>wrote: > Use case is related to purchase transactions. > > Sample data set: > Customer ID Acquisition time Products > 101 30 June 2007 Product 1 > 101 12 August 2007 Product 3 > 101 20 December 2008 Product 4 > 102 10 September 2008 Product 3 > 102 12 September 2008 Product 5 > 102 20 January 2009 Product 5..... > > Sample rule: > Rule ID Consequent Antecedents Support % > Confidence % > Rule 1 Product 4 Product 1 then Product 3 57.1 > 75.0 > > I want to identify rules such as: after acquiring product 1 and then > product 3, customers have an increased likelihood > (75%) of purchasing product 4 next. > > Thanks, > Nishant > > > On Sun, Nov 27, 2011 at 3:27 PM, Paritosh Ranjan <[email protected]> > wrote: > > Can you tell something about your use case? > > > > Paritosh > > > > On 27-11-2011 15:14, Nishant Chandra wrote: > >> > >> Hi, > >> > >> Is there any implementation for Sequential Pattern Mining in Mahout? I > >> see there is an implementation of Sequential Pattern Mining but I am > >> unsure if it can be used for my use case. > >> > >> Thanks, > >> Nishant > >> > >> > >> ----- > >> No virus found in this message. > >> Checked by AVG - www.avg.com > >> Version: 10.0.1411 / Virus Database: 2092/4041 - Release Date: 11/26/11 > > > > >
