OK.

Suppose that people who buy milk and chocolate chip cookies are good
prospects for buying life insurance, but buy either product alone is not a
strong indicator.  You can build a special feature
milk_and_chocolate_chip_cookies in addition to the separate features for
the individual milk and chocolate_chip_cookies products.  The composite
features can be order specific or not.

The potential number of such features is huge.  Clearly, a first cut is to
limit your consideration to composites that actually appear.  You probably
should limit the number that you consider even more stringently than that.

On Mon, Nov 28, 2011 at 5:21 AM, Nishant Chandra
<[email protected]>wrote:

> Hi Ted,
>
> I dont understand the composite features and super-products that you
> mentioned. Please explain a bit. Are you pointing to a specific data
> mining method?
>
> Thanks,
> Nishant
>
> On Mon, Nov 28, 2011 at 5:44 AM, Ted Dunning <[email protected]>
> wrote:
> > There are several good ways to deal with this.  The idea of
> super-products
> > which are composite features that are derived from history is a good one.
> >  I would recommend that you limit the number of such super features by
> > first finding which products cooccur within a reasonable time window more
> > than you would expect.
> >
> > The cooccurrence analysis system in Mahout can be misused for this
> analysis
> > by building one document per user per sliding window period.  This is a
> bit
> > flawed since the sliding windows overlap and thus the appearances of a
> > transaction in multiple documents is not really an indication of
> > independent appearances.  Also, the intermediate window documents are
> much
> > larger than you might like and they won't take ordering into account.
> >
> > A better approach is to adapt the current code.  The basic data you need
> to
> > collect are:
> >
> > - the number of times each product appears in a single users transaction
> > history before another product.
> >
> > - the number of times each product appears in a transaction history after
> > another product
> >
> > - the number of times product i appears after product j.
> >
> > You can then use the LLR code in Mahout to find cases where a product
> > sequence occurs anomalously often.  You can then use a Bloom filter or
> > similar data structure to analyze histories so that you emit product and
> > super-products as input to a conventional collaborative filtering
> analysis.
> >
> >
> > The second major approach to this problem is to build a separate
> classifier
> > for each product of interest.  I wouldn't recommend that if you have lots
> > of possible products, but this can work very well if you have a
> reasonably
> > small number of products (say a few hundred or thousand) that you might
> be
> > about to recommend.
> >
> >
> > On Sun, Nov 27, 2011 at 2:09 AM, Nishant Chandra
> > <[email protected]>wrote:
> >
> >> Use case is related to purchase transactions.
> >>
> >> Sample data set:
> >> Customer ID Acquisition time Products
> >> 101 30 June 2007 Product 1
> >> 101 12 August 2007 Product 3
> >> 101 20 December 2008 Product 4
> >> 102 10 September 2008 Product 3
> >> 102 12 September 2008 Product 5
> >> 102 20 January 2009 Product 5.....
> >>
> >> Sample rule:
> >> Rule ID Consequent Antecedents                        Support %
> >> Confidence %
> >> Rule 1   Product 4    Product 1 then Product 3        57.1
> >>  75.0
> >>
> >> I want to identify rules such as: after acquiring product 1 and then
> >> product 3, customers have an increased likelihood
> >> (75%) of purchasing product 4 next.
> >>
> >> Thanks,
> >> Nishant
> >>
> >>
> >> On Sun, Nov 27, 2011 at 3:27 PM, Paritosh Ranjan <[email protected]>
> >> wrote:
> >> > Can you tell something about your use case?
> >> >
> >> > Paritosh
> >> >
> >> > On 27-11-2011 15:14, Nishant Chandra wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Is there any implementation for Sequential Pattern Mining in Mahout?
> I
> >> >> see there is an implementation of Sequential Pattern Mining but I am
> >> >> unsure if it can be used for my use case.
> >> >>
> >> >> Thanks,
> >> >> Nishant
> >> >>
> >> >>
> >> >> -----
> >> >> No virus found in this message.
> >> >> Checked by AVG - www.avg.com
> >> >> Version: 10.0.1411 / Virus Database: 2092/4041 - Release Date:
> 11/26/11
> >> >
> >> >
> >>
> >
>

Reply via email to