Re: Evaluation of different recommendation algorithms for 12.000 user data set

Lance Norskog Mon, 21 Nov 2011 19:26:20 -0800

Manuel-

Thank you for writing such a clear, concise report.


On Mon, Nov 21, 2011 at 12:48 PM, Manuel Blechschmidt <
[email protected]> wrote:

> Hello Ted,
> thanks for these advices.
>
> I hope that the open source and research community will conduct more user
> studies and provide the results. There is still a lack for this.
>
> There are a lot of problems which can only be solved by learning from the
> user interaction not only from RMSE.
>
> Great stuff as soon as I have more I will try to post the results on this
> list.
>
> /Manuel
>
> Ted Dunning <[email protected]> schrieb:
>
> >Filtering recommendations lists is incredibly important.  What you are
> >doing is pretty straightforward with post-processing of the recommended
> >list.
> >
> >Other things that I often recommend include:
> >
> >- dithering.  This is partial randomization of your results list that
> moves
> >items deep in the list higher, but mostly leaves the top items in place.
> > This helps your algorithm explore more and helps avoid the problem of
> >people never clicking to the second page.  Dithering can make more
> >difference than all but the largest algorithm differences.
> >
> >- anti-flood.  It is important to not have a results list be dominated by
> a
> >single kind of thing.  The segregation of your email is a form of this.  I
> >often implement this by downgrading the scores of items very similar to
> >higher scoring items.  In some domains this makes a night and day
> >difference.
> >
> >On Mon, Nov 21, 2011 at 3:28 PM, Manuel Blechschmidt <
> >[email protected]> wrote:
> >
> >> Thanks for the answer Ted.
> >>
> >> On 21.11.2011, at 16:20, Ted Dunning wrote:
> >>
> >> > Your product is subject to seasonality constraints (which teas are
> likely
> >> > right now) and repeat buying.  I would separate out the
> recommendation of
> >> > repeat buys from the separation of new items.
> >>
> >> Actually I want to generate an email with diverse recommendations.
> >>
> >> Something like:
> >>
> >> Your personal top sellers:
> >> .. 3 items ...
> >>
> >> Special Winter Sales:
> >> ... 3 items ...
> >>
> >> This might be interesting for you:
> >> ... 6 items ...
> >>
> >> This is new in our store:
> >> ... 3 items ...
> >>
> >> >
> >> > You may also find that item-item links on your web site are helpful.
> >>  These
> >> > are easy to get using this system.
> >>
> >> Yes, actually the website is already using some very basic item-to-item
> >> recommendations. So I am more interested in the newsletter part
> especially
> >> because I can track which items are really attractive and which aren't.
> >>
> >> /Manuel
> >>
> >> >
> >> > On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
> >> > [email protected]> wrote:
> >> >
> >> >> Hello Sean,
> >> >>
> >> >> On 21.11.2011, at 12:16, Sean Owen wrote:
> >> >>
> >> >>> Yes, because you have fewer items, an item-item-similarity-based
> >> >> algorithm
> >> >>> probably runs much faster.
> >> >>
> >> >> Thanks for your blazing fast feedback.
> >> >>
> >> >>>
> >> >>> I would not necessarily use the raw number of kg as a preference.
> It's
> >> >> not
> >> >>> really true that someone who buys 10kg of an item likes it 10x more
> >> than
> >> >>> one he buys 1kg of. Maybe the second spice is much more valuable? I
> >> would
> >> >>> at least try taking the logarithm of the weight, but, I think this
> is
> >> >> very
> >> >>> noisy as a proxy for "preference". It creates illogical leaps --
> >> because
> >> >>> one user bought 85kg of X, and Y is "similar" to X, this would
> conclude
> >> >>> that you're somewhat likely to buy 85kg of Y too. I would probably
> not
> >> >> use
> >> >>> weight at all this way.
> >> >>
> >> >> Thanks for this suggestions. I will consider to integrate a
> logarithmic
> >> >> weight into the recommender. At the moment I am more concerned to get
> >> the
> >> >> user feedback component working. From some manual tests I can already
> >> tell
> >> >> that the recommendation for some users make sense.
> >> >>
> >> >> Based on my own profile I can tell that when I buy more of a certain
> >> >> product then I also like the product more.
> >> >>
> >> >> I am also thinking about some seasonal tweaking. Tea is a very
> seasonal
> >> >> product during winter and christmas other flavors are sold then in
> >> summer.
> >> >>
> >>
> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
> >> >>
> >> >>>
> >> >>> It is not therefore surprising that log-likelihood works well,
> since it
> >> >>> ignores this value actually.
> >> >>>
> >> >>> (You mentioned RMSE but your evaluation metric is
> >> >>> average-absolute-difference -- L1, not L2).
> >> >>
> >> >> You are right RMSE (root-mean-squared-error) is wrong. I think it is
> MEA
> >> >> (mean-avagerage-error).
> >> >>
> >> >>>
> >> >>> This is quite a small data set so you should have no performance
> >> issues.
> >> >>> Your evaluations, which run over all users in the data set, are
> taking
> >> >> mere
> >> >>> seconds. I am sure you could get away with much less
> memory/processing
> >> if
> >> >>> you like.
> >> >>
> >> >> This is by far good enough. The more important part is the newsletter
> >> >> sending. I have to generate about 10.000 emails that makes more
> headache
> >> >> then the recommender.
> >> >>
> >> >> /Manuel
> >> >>
> >> >>>
> >> >>>
> >> >>> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
> >> >>> [email protected]> wrote:
> >> >>>
> >> >>>> Hello Mahout Team, hello users,
> >> >>>> me and a friend are currently evaluating recommendation techniques
> for
> >> >>>> personalizing a newsletter for a company selling tea, spices and
> some
> >> >> other
> >> >>>> products. Mahout is such a great product which saves me hours of
> time
> >> >> and
> >> >>>> millions of money because I want to give something back I write
> this
> >> >> small
> >> >>>> case study to the mailing list.
> >> >>>>
> >> >>>> I am conducting an offline testing of which recommender is the most
> >> >>>> accurate one. Further I am interested in run time behavior like
> memory
> >> >>>> consumption and runtime.
> >> >>>>
> >> >>>> The data contains implicit feedback. The preferences of the user is
> >> the
> >> >>>> amount in gramm that he bought from a certain product (453 g ~ 1
> >> >> pound). If
> >> >>>> a certain product does not have this data it is replaced with 50.
> So
> >> >>>> basically I want mahout to predict how much of a certain product
> is a
> >> >> user
> >> >>>> buying next. This is also helpful for demand planing. I am
> currently
> >> not
> >> >>>> using any time data because I did not find a recommender which is
> >> using
> >> >>>> this data.
> >> >>>>
> >> >>>> Users: 12858
> >> >>>> Items: 5467
> >> >>>> 121304 preferences
> >> >>>> MaxPreference: 85850.0 (Meaning that there is someone who ordered
> 85
> >> kg
> >> >> of
> >> >>>> a certain tea or spice)
> >> >>>> MinPreference: 50.0
> >> >>>>
> >> >>>> Here are the pure benchmarks for accuracy in RMSE. They change
> during
> >> >>>> every run of the evaluation (~15%):
> >> >>>>
> >> >>>> Evaluation of randomBased (baseline): 43045.380570443434
> >> >>>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
> >> >>>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
> >> >>>> (GenericItemBasedRecommender(model,
> >> PearsonCorrelationSimilarity(model))
> >> >>>> (Time: ~1s)  (Memory: 35MB)
> >> >>>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
> >> >>>> (GenericItemBasedRecommender(model,
> >> UncenteredCosineSimilarity(model)))
> >> >>>> (Time: ~1s)  (Memory: 32MB)
> >> >>>> Evaluation of ItemBase with log likelihood: 176.45243607278724
> >> >>>> (GenericItemBasedRecommender(model,
> LogLikelihoodSimilarity(model)))
> >> >>>> (Time: ~5s)  (Memory: 42MB)
> >> >>>> Evaluation of UserBased 3 with Pearson Correlation:
> 1378.1188069379868
> >> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
> >> >>>> PearsonCorrelationSimilarity(model), model),
> >> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
> >> >>>> Evaluation of UserBased 20 with Pearson Correlation:
> >> 1144.1905989614288
> >> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
> >> >>>> PearsonCorrelationSimilarity(model), model),
> >> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
> >> >>>> Evaluation of SlopeOne: 464.8989330869532
> (SlopeOneRecommender(model))
> >> >>>> (Time: ~4s) (Memory: 604MB)
> >> >>>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model,
> >> 100,
> >> >>>> 0.3, 5)) (Time: ) (Memory: 691MB)
> >> >>>>
> >> >>>> These were measured with the following method:
> >> >>>>
> >> >>>> RecommenderEvaluator evaluator = new
> >> >>>> AverageAbsoluteDifferenceRecommenderEvaluator();
> >> >>>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
> >> >>>>      0.9, 1.0);
> >> >>>>
> >> >>>> Memory usage was about 50m with the item based case. Slope One and
> SVD
> >> >>>> base seams to use the most memory (615MB & 691MB).
> >> >>>>
> >> >>>> The performance differs a lot. The fastest ones where the item
> based.
> >> >> They
> >> >>>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
> >> >>>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
> >> >>>> The user based where a lot slower.
> >> >>>>
> >> >>>> Conclusion is that in my case the item based approach is the
> fastest,
> >> >>>> lowest memory consumption and most accurate one. Further I can use
> the
> >> >>>> recommendedBecause function.
> >> >>>>
> >> >>>> Here is the spec of the computer:
> >> >>>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
> >> >>>>
> >> >>>> In the next step, probably in the next 2 month. I have to design a
> >> >>>> newsletter and send it to the customers. Then I can benchmark the
> user
> >> >>>> acceptance rate of the recommendations.
> >> >>>>
> >> >>>> Any suggestions for enhancements are appreciated. If anybody is
> >> >> interested
> >> >>>> in the dataset or the evaluation code send me a private email. I
> might
> >> >> be
> >> >>>> able to convince the company to give out the dataset if the person
> is
> >> >> doing
> >> >>>> some interesting research.
> >> >>>>
> >> >>>> /Manuel
> >> >>>> --
> >> >>>> Manuel Blechschmidt
> >> >>>> Dortustr. 57
> >> >>>> 14467 Potsdam
> >> >>>> Mobil: 0173/6322621
> >> >>>> Twitter: http://twitter.com/Manuel_B
> >> >>>>
> >> >>>>
> >> >>
> >> >> --
> >> >> Manuel Blechschmidt
> >> >> Dortustr. 57
> >> >> 14467 Potsdam
> >> >> Mobil: 0173/6322621
> >> >> Twitter: http://twitter.com/Manuel_B
> >> >>
> >> >>
> >>
> >> --
> >> Manuel Blechschmidt
> >> Dortustr. 57
> >> 14467 Potsdam
> >> Mobil: 0173/6322621
> >> Twitter: http://twitter.com/Manuel_B
> >>
> >>
>



-- 
Lance Norskog
[email protected]

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Reply via email to