Re: Evaluation of different recommendation algorithms for 12.000 user data set

Manuel Blechschmidt Mon, 21 Nov 2011 12:48:23 -0800

Hello Ted,
thanks for these advices.

I hope that the open source and research community will conduct more user 
studies and provide the results. There is still a lack for this.


There are a lot of problems which can only be solved by learning from the user 
interaction not only from RMSE.

Great stuff as soon as I have more I will try to post the results on this list.

/Manuel

Ted Dunning <[email protected]> schrieb:

>Filtering recommendations lists is incredibly important.  What you are
>doing is pretty straightforward with post-processing of the recommended
>list.
>
>Other things that I often recommend include:
>
>- dithering.  This is partial randomization of your results list that moves
>items deep in the list higher, but mostly leaves the top items in place.
> This helps your algorithm explore more and helps avoid the problem of
>people never clicking to the second page.  Dithering can make more
>difference than all but the largest algorithm differences.
>
>- anti-flood.  It is important to not have a results list be dominated by a
>single kind of thing.  The segregation of your email is a form of this.  I
>often implement this by downgrading the scores of items very similar to
>higher scoring items.  In some domains this makes a night and day
>difference.
>
>On Mon, Nov 21, 2011 at 3:28 PM, Manuel Blechschmidt <
>[email protected]> wrote:
>
>> Thanks for the answer Ted.
>>
>> On 21.11.2011, at 16:20, Ted Dunning wrote:
>>
>> > Your product is subject to seasonality constraints (which teas are likely
>> > right now) and repeat buying.  I would separate out the recommendation of
>> > repeat buys from the separation of new items.
>>
>> Actually I want to generate an email with diverse recommendations.
>>
>> Something like:
>>
>> Your personal top sellers:
>> .. 3 items ...
>>
>> Special Winter Sales:
>> ... 3 items ...
>>
>> This might be interesting for you:
>> ... 6 items ...
>>
>> This is new in our store:
>> ... 3 items ...
>>
>> >
>> > You may also find that item-item links on your web site are helpful.
>>  These
>> > are easy to get using this system.
>>
>> Yes, actually the website is already using some very basic item-to-item
>> recommendations. So I am more interested in the newsletter part especially
>> because I can track which items are really attractive and which aren't.
>>
>> /Manuel
>>
>> >
>> > On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
>> > [email protected]> wrote:
>> >
>> >> Hello Sean,
>> >>
>> >> On 21.11.2011, at 12:16, Sean Owen wrote:
>> >>
>> >>> Yes, because you have fewer items, an item-item-similarity-based
>> >> algorithm
>> >>> probably runs much faster.
>> >>
>> >> Thanks for your blazing fast feedback.
>> >>
>> >>>
>> >>> I would not necessarily use the raw number of kg as a preference. It's
>> >> not
>> >>> really true that someone who buys 10kg of an item likes it 10x more
>> than
>> >>> one he buys 1kg of. Maybe the second spice is much more valuable? I
>> would
>> >>> at least try taking the logarithm of the weight, but, I think this is
>> >> very
>> >>> noisy as a proxy for "preference". It creates illogical leaps --
>> because
>> >>> one user bought 85kg of X, and Y is "similar" to X, this would conclude
>> >>> that you're somewhat likely to buy 85kg of Y too. I would probably not
>> >> use
>> >>> weight at all this way.
>> >>
>> >> Thanks for this suggestions. I will consider to integrate a logarithmic
>> >> weight into the recommender. At the moment I am more concerned to get
>> the
>> >> user feedback component working. From some manual tests I can already
>> tell
>> >> that the recommendation for some users make sense.
>> >>
>> >> Based on my own profile I can tell that when I buy more of a certain
>> >> product then I also like the product more.
>> >>
>> >> I am also thinking about some seasonal tweaking. Tea is a very seasonal
>> >> product during winter and christmas other flavors are sold then in
>> summer.
>> >>
>> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
>> >>
>> >>>
>> >>> It is not therefore surprising that log-likelihood works well, since it
>> >>> ignores this value actually.
>> >>>
>> >>> (You mentioned RMSE but your evaluation metric is
>> >>> average-absolute-difference -- L1, not L2).
>> >>
>> >> You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA
>> >> (mean-avagerage-error).
>> >>
>> >>>
>> >>> This is quite a small data set so you should have no performance
>> issues.
>> >>> Your evaluations, which run over all users in the data set, are taking
>> >> mere
>> >>> seconds. I am sure you could get away with much less memory/processing
>> if
>> >>> you like.
>> >>
>> >> This is by far good enough. The more important part is the newsletter
>> >> sending. I have to generate about 10.000 emails that makes more headache
>> >> then the recommender.
>> >>
>> >> /Manuel
>> >>
>> >>>
>> >>>
>> >>> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
>> >>> [email protected]> wrote:
>> >>>
>> >>>> Hello Mahout Team, hello users,
>> >>>> me and a friend are currently evaluating recommendation techniques for
>> >>>> personalizing a newsletter for a company selling tea, spices and some
>> >> other
>> >>>> products. Mahout is such a great product which saves me hours of time
>> >> and
>> >>>> millions of money because I want to give something back I write this
>> >> small
>> >>>> case study to the mailing list.
>> >>>>
>> >>>> I am conducting an offline testing of which recommender is the most
>> >>>> accurate one. Further I am interested in run time behavior like memory
>> >>>> consumption and runtime.
>> >>>>
>> >>>> The data contains implicit feedback. The preferences of the user is
>> the
>> >>>> amount in gramm that he bought from a certain product (453 g ~ 1
>> >> pound). If
>> >>>> a certain product does not have this data it is replaced with 50. So
>> >>>> basically I want mahout to predict how much of a certain product is a
>> >> user
>> >>>> buying next. This is also helpful for demand planing. I am currently
>> not
>> >>>> using any time data because I did not find a recommender which is
>> using
>> >>>> this data.
>> >>>>
>> >>>> Users: 12858
>> >>>> Items: 5467
>> >>>> 121304 preferences
>> >>>> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85
>> kg
>> >> of
>> >>>> a certain tea or spice)
>> >>>> MinPreference: 50.0
>> >>>>
>> >>>> Here are the pure benchmarks for accuracy in RMSE. They change during
>> >>>> every run of the evaluation (~15%):
>> >>>>
>> >>>> Evaluation of randomBased (baseline): 43045.380570443434
>> >>>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
>> >>>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
>> >>>> (GenericItemBasedRecommender(model,
>> PearsonCorrelationSimilarity(model))
>> >>>> (Time: ~1s)  (Memory: 35MB)
>> >>>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
>> >>>> (GenericItemBasedRecommender(model,
>> UncenteredCosineSimilarity(model)))
>> >>>> (Time: ~1s)  (Memory: 32MB)
>> >>>> Evaluation of ItemBase with log likelihood: 176.45243607278724
>> >>>> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
>> >>>> (Time: ~5s)  (Memory: 42MB)
>> >>>> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
>> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
>> >>>> PearsonCorrelationSimilarity(model), model),
>> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
>> >>>> Evaluation of UserBased 20 with Pearson Correlation:
>> 1144.1905989614288
>> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
>> >>>> PearsonCorrelationSimilarity(model), model),
>> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
>> >>>> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
>> >>>> (Time: ~4s) (Memory: 604MB)
>> >>>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model,
>> 100,
>> >>>> 0.3, 5)) (Time: ) (Memory: 691MB)
>> >>>>
>> >>>> These were measured with the following method:
>> >>>>
>> >>>> RecommenderEvaluator evaluator = new
>> >>>> AverageAbsoluteDifferenceRecommenderEvaluator();
>> >>>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
>> >>>>      0.9, 1.0);
>> >>>>
>> >>>> Memory usage was about 50m with the item based case. Slope One and SVD
>> >>>> base seams to use the most memory (615MB & 691MB).
>> >>>>
>> >>>> The performance differs a lot. The fastest ones where the item based.
>> >> They
>> >>>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
>> >>>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
>> >>>> The user based where a lot slower.
>> >>>>
>> >>>> Conclusion is that in my case the item based approach is the fastest,
>> >>>> lowest memory consumption and most accurate one. Further I can use the
>> >>>> recommendedBecause function.
>> >>>>
>> >>>> Here is the spec of the computer:
>> >>>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
>> >>>>
>> >>>> In the next step, probably in the next 2 month. I have to design a
>> >>>> newsletter and send it to the customers. Then I can benchmark the user
>> >>>> acceptance rate of the recommendations.
>> >>>>
>> >>>> Any suggestions for enhancements are appreciated. If anybody is
>> >> interested
>> >>>> in the dataset or the evaluation code send me a private email. I might
>> >> be
>> >>>> able to convince the company to give out the dataset if the person is
>> >> doing
>> >>>> some interesting research.
>> >>>>
>> >>>> /Manuel
>> >>>> --
>> >>>> Manuel Blechschmidt
>> >>>> Dortustr. 57
>> >>>> 14467 Potsdam
>> >>>> Mobil: 0173/6322621
>> >>>> Twitter: http://twitter.com/Manuel_B
>> >>>>
>> >>>>
>> >>
>> >> --
>> >> Manuel Blechschmidt
>> >> Dortustr. 57
>> >> 14467 Potsdam
>> >> Mobil: 0173/6322621
>> >> Twitter: http://twitter.com/Manuel_B
>> >>
>> >>
>>
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Reply via email to