Hi Ted, I dont understand the composite features and super-products that you mentioned. Please explain a bit. Are you pointing to a specific data mining method?
Thanks, Nishant On Mon, Nov 28, 2011 at 5:44 AM, Ted Dunning <[email protected]> wrote: > There are several good ways to deal with this. The idea of super-products > which are composite features that are derived from history is a good one. > I would recommend that you limit the number of such super features by > first finding which products cooccur within a reasonable time window more > than you would expect. > > The cooccurrence analysis system in Mahout can be misused for this analysis > by building one document per user per sliding window period. This is a bit > flawed since the sliding windows overlap and thus the appearances of a > transaction in multiple documents is not really an indication of > independent appearances. Also, the intermediate window documents are much > larger than you might like and they won't take ordering into account. > > A better approach is to adapt the current code. The basic data you need to > collect are: > > - the number of times each product appears in a single users transaction > history before another product. > > - the number of times each product appears in a transaction history after > another product > > - the number of times product i appears after product j. > > You can then use the LLR code in Mahout to find cases where a product > sequence occurs anomalously often. You can then use a Bloom filter or > similar data structure to analyze histories so that you emit product and > super-products as input to a conventional collaborative filtering analysis. > > > The second major approach to this problem is to build a separate classifier > for each product of interest. I wouldn't recommend that if you have lots > of possible products, but this can work very well if you have a reasonably > small number of products (say a few hundred or thousand) that you might be > about to recommend. > > > On Sun, Nov 27, 2011 at 2:09 AM, Nishant Chandra > <[email protected]>wrote: > >> Use case is related to purchase transactions. >> >> Sample data set: >> Customer ID Acquisition time Products >> 101 30 June 2007 Product 1 >> 101 12 August 2007 Product 3 >> 101 20 December 2008 Product 4 >> 102 10 September 2008 Product 3 >> 102 12 September 2008 Product 5 >> 102 20 January 2009 Product 5..... >> >> Sample rule: >> Rule ID Consequent Antecedents Support % >> Confidence % >> Rule 1 Product 4 Product 1 then Product 3 57.1 >> 75.0 >> >> I want to identify rules such as: after acquiring product 1 and then >> product 3, customers have an increased likelihood >> (75%) of purchasing product 4 next. >> >> Thanks, >> Nishant >> >> >> On Sun, Nov 27, 2011 at 3:27 PM, Paritosh Ranjan <[email protected]> >> wrote: >> > Can you tell something about your use case? >> > >> > Paritosh >> > >> > On 27-11-2011 15:14, Nishant Chandra wrote: >> >> >> >> Hi, >> >> >> >> Is there any implementation for Sequential Pattern Mining in Mahout? I >> >> see there is an implementation of Sequential Pattern Mining but I am >> >> unsure if it can be used for my use case. >> >> >> >> Thanks, >> >> Nishant >> >> >> >> >> >> ----- >> >> No virus found in this message. >> >> Checked by AVG - www.avg.com >> >> Version: 10.0.1411 / Virus Database: 2092/4041 - Release Date: 11/26/11 >> > >> > >> >
