Re: Evaluation of different recommendation algorithms for 12.000 user data set

Manuel Blechschmidt Mon, 21 Nov 2011 07:29:06 -0800

Thanks for the answer Ted.

On 21.11.2011, at 16:20, Ted Dunning wrote:


> Your product is subject to seasonality constraints (which teas are likely
> right now) and repeat buying.  I would separate out the recommendation of
> repeat buys from the separation of new items.

Actually I want to generate an email with diverse recommendations.

Something like:

Your personal top sellers:
.. 3 items ...

Special Winter Sales:
... 3 items ...

This might be interesting for you:
... 6 items ...

This is new in our store:
... 3 items ...

> 
> You may also find that item-item links on your web site are helpful.  These
> are easy to get using this system.

Yes, actually the website is already using some very basic item-to-item 
recommendations. So I am more interested in the newsletter part especially 
because I can track which items are really attractive and which aren't.

/Manuel

> 
> On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
> [email protected]> wrote:
> 
>> Hello Sean,
>> 
>> On 21.11.2011, at 12:16, Sean Owen wrote:
>> 
>>> Yes, because you have fewer items, an item-item-similarity-based
>> algorithm
>>> probably runs much faster.
>> 
>> Thanks for your blazing fast feedback.
>> 
>>> 
>>> I would not necessarily use the raw number of kg as a preference. It's
>> not
>>> really true that someone who buys 10kg of an item likes it 10x more than
>>> one he buys 1kg of. Maybe the second spice is much more valuable? I would
>>> at least try taking the logarithm of the weight, but, I think this is
>> very
>>> noisy as a proxy for "preference". It creates illogical leaps -- because
>>> one user bought 85kg of X, and Y is "similar" to X, this would conclude
>>> that you're somewhat likely to buy 85kg of Y too. I would probably not
>> use
>>> weight at all this way.
>> 
>> Thanks for this suggestions. I will consider to integrate a logarithmic
>> weight into the recommender. At the moment I am more concerned to get the
>> user feedback component working. From some manual tests I can already tell
>> that the recommendation for some users make sense.
>> 
>> Based on my own profile I can tell that when I buy more of a certain
>> product then I also like the product more.
>> 
>> I am also thinking about some seasonal tweaking. Tea is a very seasonal
>> product during winter and christmas other flavors are sold then in summer.
>> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
>> 
>>> 
>>> It is not therefore surprising that log-likelihood works well, since it
>>> ignores this value actually.
>>> 
>>> (You mentioned RMSE but your evaluation metric is
>>> average-absolute-difference -- L1, not L2).
>> 
>> You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA
>> (mean-avagerage-error).
>> 
>>> 
>>> This is quite a small data set so you should have no performance issues.
>>> Your evaluations, which run over all users in the data set, are taking
>> mere
>>> seconds. I am sure you could get away with much less memory/processing if
>>> you like.
>> 
>> This is by far good enough. The more important part is the newsletter
>> sending. I have to generate about 10.000 emails that makes more headache
>> then the recommender.
>> 
>> /Manuel
>> 
>>> 
>>> 
>>> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
>>> [email protected]> wrote:
>>> 
>>>> Hello Mahout Team, hello users,
>>>> me and a friend are currently evaluating recommendation techniques for
>>>> personalizing a newsletter for a company selling tea, spices and some
>> other
>>>> products. Mahout is such a great product which saves me hours of time
>> and
>>>> millions of money because I want to give something back I write this
>> small
>>>> case study to the mailing list.
>>>> 
>>>> I am conducting an offline testing of which recommender is the most
>>>> accurate one. Further I am interested in run time behavior like memory
>>>> consumption and runtime.
>>>> 
>>>> The data contains implicit feedback. The preferences of the user is the
>>>> amount in gramm that he bought from a certain product (453 g ~ 1
>> pound). If
>>>> a certain product does not have this data it is replaced with 50. So
>>>> basically I want mahout to predict how much of a certain product is a
>> user
>>>> buying next. This is also helpful for demand planing. I am currently not
>>>> using any time data because I did not find a recommender which is using
>>>> this data.
>>>> 
>>>> Users: 12858
>>>> Items: 5467
>>>> 121304 preferences
>>>> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg
>> of
>>>> a certain tea or spice)
>>>> MinPreference: 50.0
>>>> 
>>>> Here are the pure benchmarks for accuracy in RMSE. They change during
>>>> every run of the evaluation (~15%):
>>>> 
>>>> Evaluation of randomBased (baseline): 43045.380570443434
>>>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
>>>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
>>>> (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model))
>>>> (Time: ~1s)  (Memory: 35MB)
>>>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
>>>> (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model)))
>>>> (Time: ~1s)  (Memory: 32MB)
>>>> Evaluation of ItemBase with log likelihood: 176.45243607278724
>>>> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
>>>> (Time: ~5s)  (Memory: 42MB)
>>>> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
>>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
>>>> PearsonCorrelationSimilarity(model), model),
>>>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
>>>> Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288
>>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
>>>> PearsonCorrelationSimilarity(model), model),
>>>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
>>>> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
>>>> (Time: ~4s) (Memory: 604MB)
>>>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100,
>>>> 0.3, 5)) (Time: ) (Memory: 691MB)
>>>> 
>>>> These were measured with the following method:
>>>> 
>>>> RecommenderEvaluator evaluator = new
>>>> AverageAbsoluteDifferenceRecommenderEvaluator();
>>>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
>>>>      0.9, 1.0);
>>>> 
>>>> Memory usage was about 50m with the item based case. Slope One and SVD
>>>> base seams to use the most memory (615MB & 691MB).
>>>> 
>>>> The performance differs a lot. The fastest ones where the item based.
>> They
>>>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
>>>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
>>>> The user based where a lot slower.
>>>> 
>>>> Conclusion is that in my case the item based approach is the fastest,
>>>> lowest memory consumption and most accurate one. Further I can use the
>>>> recommendedBecause function.
>>>> 
>>>> Here is the spec of the computer:
>>>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
>>>> 
>>>> In the next step, probably in the next 2 month. I have to design a
>>>> newsletter and send it to the customers. Then I can benchmark the user
>>>> acceptance rate of the recommendations.
>>>> 
>>>> Any suggestions for enhancements are appreciated. If anybody is
>> interested
>>>> in the dataset or the evaluation code send me a private email. I might
>> be
>>>> able to convince the company to give out the dataset if the person is
>> doing
>>>> some interesting research.
>>>> 
>>>> /Manuel
>>>> --
>>>> Manuel Blechschmidt
>>>> Dortustr. 57
>>>> 14467 Potsdam
>>>> Mobil: 0173/6322621
>>>> Twitter: http://twitter.com/Manuel_B
>>>> 
>>>> 
>> 
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>> 
>> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Reply via email to