Manuel, I starting running the evaluation as proposed. But it seems it will
take forever to complete. It does the evaluation for each user which takes
well over a minute. What am I doing wrong?
This is my code :
RecommenderBuilder itemBasedBuilder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model) {
// build and return the Recommender to evaluate here
try {
ItemSimilarity itemSimilarity = newCachingItemSimilarity(
new LogLikelihoodSimilarity(model), model);
CandidateItemsStrategy candidateItemsStrategy = new
OptimizedItemStrategy(20,
2, 100);
MostSimilarItemsCandidateItemsStrategy
mostSimilarItemsCandidateItemsStrategy = new OptimizedItemStrategy(20, 2,
100);
ItemBasedRecommender recommender =
newGenericBooleanPrefItemBasedRecommender(
dataModel, itemSimilarity, candidateItemsStrategy,
mostSimilarItemsCandidateItemsStrategy);
return recommender;
} catch (TasteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return null;
}
}
};
RecommenderIRStatsEvaluator evaluator = new
GenericRecommenderIRStatsEvaluator();
try {
IRStatistics stats = evaluator.evaluate(itemBasedBuilder, null,
this.dataModel, null, 3, 0, 1.0);
logger.info("Evalute returned:" + stats.toString());
} catch (TasteException e) {
// TODO Auto-generated catch block
logger.error("",e);
}
On Fri, Dec 2, 2011 at 1:29 PM, Daniel Zohar <[email protected]> wrote:
> Hello Manuel,
> I will run the tests as requested and post the results later.
>
>
> On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt <
> [email protected]> wrote:
>
>> Hello Daniel,
>>
>> On 02.12.2011, at 12:02, Daniel Zohar wrote:
>>
>> > Hi guys,
>> >
>> > ...
>> > I just ran the fix I proposed earlier and I got great results! The query
>> > time was reduced to about a third for the 'heavy users'. Before it was
>> 1-5
>> > secs and now it's 0.5-1.5. The best part is that the accuracy level
>> should
>> > remain exactly the same. I also believe it should reduce memory
>> > consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets
>> > significantly smaller (in my case at least).
>>
>> It would be great if you could measure your run time performance and your
>> accuracy with the provided Mahout tools.
>>
>> In your case because you only have boolean feedback precision and recall
>> would make sense.
>>
>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html
>>
>> RecommenderIRStatsEvaluator evaluator = new
>> GenericRecommenderIRStatsEvaluator();
>> IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3,
>> RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>>
>>
>> Here is some example code from me:
>>
>> public void testEvaluateRecommender() {
>> try {
>> DataModel myModel = new
>> MyModelImplementationDataModel();
>>
>> // Users: 12858
>> // Items: 5467
>> // MaxPreference: 85850.0
>> // MinPreference: 50.0
>> System.out.println("Users:
>> "+myModel.getNumUsers());
>> System.out.println("Items:
>> "+myModel.getNumItems());
>> System.out.println("MaxPreference:
>> "+myModel.getMaxPreference());
>> System.out.println("MinPreference:
>> "+myModel.getMinPreference());
>>
>> RecommenderBuilder randomBased = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> RandomRecommender(model);
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder genericItemBased = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> GenericItemBasedRecommender(model,
>> new
>> PearsonCorrelationSimilarity(model));
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder genericItemBasedCosine = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> GenericItemBasedRecommender(model,
>> new
>> UncenteredCosineSimilarity(model));
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder genericItemBasedLikely = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> return new
>> GenericItemBasedRecommender(model,
>> new
>> LogLikelihoodSimilarity(model));
>> }
>> };
>>
>>
>> RecommenderBuilder genericUserBasedNN3 = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> GenericUserBasedRecommender(
>> model,
>> new
>> NearestNUserNeighborhood(
>>
>> 3,
>>
>> new PearsonCorrelationSimilarity(model),
>>
>> model),
>> new
>> PearsonCorrelationSimilarity(model));
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder genericUserBasedNN20 = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> GenericUserBasedRecommender(
>> model,
>> new
>> NearestNUserNeighborhood(
>>
>> 20,
>>
>> new PearsonCorrelationSimilarity(model),
>>
>> model),
>> new
>> PearsonCorrelationSimilarity(model));
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder slopeOneBased = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> SlopeOneRecommender(model);
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> RecommenderBuilder svdBased = new
>> RecommenderBuilder() {
>> public Recommender
>> buildRecommender(DataModel model) {
>> // build and return the
>> Recommender to evaluate here
>> try {
>> return new
>> SVDRecommender(model, new ALSWRFactorizer(
>> model,
>> 100, 0.3, 5));
>> } catch (TasteException e) {
>> // TODO Auto-generated
>> catch block
>> e.printStackTrace();
>> return null;
>> }
>> }
>> };
>>
>> // Data Set Summary:
>> // 12858 users
>> // 121304 preferences
>>
>> RecommenderEvaluator evaluator = new
>> AverageAbsoluteDifferenceRecommenderEvaluator();
>>
>> double evaluation =
>> evaluator.evaluate(randomBased, null, myModel,
>> 0.9, 1.0);
>> // Evaluation of randomBased (baseline):
>> 43045.380570443434
>> // (RandomRecommender(model))
>> System.out.println("Evaluation of randomBased
>> (baseline): "
>> + evaluation);
>>
>> // evaluation =
>> evaluator.evaluate(genericItemBased, null, myModel,
>> // 0.9, 1.0);
>> // Evaluation of ItemBased with Pearson
>> Correlation:
>> // 315.5804958647985
>> (GenericItemBasedRecommender(model,
>> // PearsonCorrelationSimilarity(model))
>> // System.out
>> // .println("Evaluation of ItemBased with Pearson
>> Correlation: "
>> // + evaluation);
>>
>> // evaluation =
>> evaluator.evaluate(genericItemBasedCosine, null,
>> // myModel, 0.9, 1.0);
>> // Evaluation of ItemBase with uncentered Cosine:
>> 198.25393235323375
>> // (GenericItemBasedRecommender(model,
>> // UncenteredCosineSimilarity(model)))
>> // System.out
>> // .println("Evaluation of ItemBased with
>> Uncentered Cosine: "
>> // + evaluation);
>>
>> evaluation =
>> evaluator.evaluate(genericItemBasedLikely, null,
>> myModel, 0.9, 1.0);
>> // Evaluation of ItemBase with log likelihood:
>> 176.45243607278724
>> // (GenericItemBasedRecommender(model,
>> // LogLikelihoodSimilarity(model)))
>> System.out
>> .println("Evaluation of ItemBased
>> with LogLikelihood: "
>> + evaluation);
>>
>>
>>
>> // User based is slow and inaccurate
>> // evaluation =
>> evaluator.evaluate(genericUserBasedNN3, null,
>> // myModel, 0.9, 1.0);
>> // Evaluation of UserBased 3 with Pearson
>> Correlation:
>> // 1774.9897130330407
>> (GenericUserBasedRecommender(model,
>> // NearestNUserNeighborhood(3,
>> PearsonCorrelationSimilarity(model),
>> // model), PearsonCorrelationSimilarity(model)))
>> // took about 2 minutes
>> // System.out.println("Evaluation of UserBased 3
>> with Pearson Correlation: "+evaluation);
>>
>> // evaluation =
>> evaluator.evaluate(genericUserBasedNN20, null,
>> // myModel, 0.9, 1.0);
>> // Evaluation of UserBased 20 with Pearson
>> // Correlation:1329.137324225053
>> (GenericUserBasedRecommender(model,
>> // NearestNUserNeighborhood(20,
>> PearsonCorrelationSimilarity(model),
>> // model), PearsonCorrelationSimilarity(model)))
>> // took about 3 minutes
>> // System.out.println("Evaluation of UserBased 20
>> with Pearson Correlation: "+evaluation);
>>
>> // evaluation = evaluator.evaluate(slopeOneBased,
>> null, myModel,
>> // 0.9, 1.0);
>> // Evaluation of SlopeOne: 464.8989330869532
>> // (SlopeOneRecommender(model))
>> // System.out.println("Evaluation of SlopeOne:
>> "+evaluation);
>>
>> // evaluation = evaluator.evaluate(svdBased, null,
>> myModel, 0.9,
>> // 1.0);
>> // Evaluation of SVD based: 378.9776153202042
>> // (ALSWRFactorizer(model, 100, 0.3, 5))
>> // took about 10 minutes to calculate on a Mac
>> Book Pro
>> // System.out.println("Evaluation of SVD based:
>> "+evaluation);
>>
>> } catch (TasteException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>>
>> }
>>
>> >
>> > The fix is merely adding two lines of code to one of
>> > the GenericBooleanPrefDataModel constructors. See
>> > http://pastebin.com/K5PB68Et, the lines I added are #11, #22.
>> >
>> > The only problem I see at the moment, is that the similarities
>> > implementations are using the num of users per item in the
>> > item-item similarity calculation. This _can_ be mitigated by creating an
>> > additional Map in the DataModel which maps itemID to numUsers.
>> >
>> > What do you think about the proposed solution? Perhaps I am missing some
>> > other implications?
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <[email protected]> wrote:
>> >
>> >> (Agree, and the sampling happens at the user level now -- so if you
>> sample
>> >> one of these users, it slows down a lot. The spirit of the proposed
>> change
>> >> is to make sampling more fine-grained, at the individual item level.
>> That
>> >> seems to certainly fix this.)
>> >>
>> >> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]>
>> >> wrote:
>> >>
>> >>> This may or may not help much. My guess is that the improvement will
>> be
>> >>> very modest.
>> >>>
>> >>> The most serious problem is going to be recommendations for anybody
>> who
>> >> has
>> >>> rated one of these excessively popular items. That item will bring
>> in a
>> >>> huge number of other users and thus a huge number of items to
>> consider.
>> >> If
>> >>> you down-sample ratings of the prolific users and kill super-common
>> >> items,
>> >>> I think you will see much more improvement than simply eliminating the
>> >>> singleton users.
>> >>>
>> >>> The basic issue is that cooccurrence based algorithms have run-time
>> >>> proportional to O(n_max^2) where n_max is the maximum number of items
>> per
>> >>> user.
>> >>>
>> >>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]>
>> wrote:
>> >>>
>> >>>> This is why I'm looking now into improving
>> GenericBooleanPrefDataModel
>> >> to
>> >>>> not take into account users which made one interaction under the
>> >>>> 'preferenceForItems' Map. What do you think about this approach?
>> >>>>
>> >>>
>> >>
>>
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
>