No worries Manual. I think we have almost done cracking the problem. Lets wait for Sean's response. Cheers
On Mon, Dec 5, 2011 at 2:46 PM, Manuel Blechschmidt < [email protected]> wrote: > Hi Daniel, > unfortunately I did not tried yet the IRStatistics analyzer so I am not > able to diagnose the performance problems. At the moment I also have to > work on some other other stuff. > > I know that it uses a thread pool for parallelizing the evaluation. > Perhaps you could sample down your data set or let it run over night. > > Sorry > Manuel > > On 02.12.2011, at 16:26, Daniel Zohar wrote: > > > Manuel, I starting running the evaluation as proposed. But it seems it > will > > take forever to complete. It does the evaluation for each user which > takes > > well over a minute. What am I doing wrong? > > This is my code : > > > > RecommenderBuilder itemBasedBuilder = new RecommenderBuilder() { > > > > public Recommender buildRecommender(DataModel model) { > > > > // build and return the Recommender to evaluate here > > > > try { > > > > ItemSimilarity itemSimilarity = > newCachingItemSimilarity( > > new LogLikelihoodSimilarity(model), model); > > > > CandidateItemsStrategy candidateItemsStrategy = new > > OptimizedItemStrategy(20, > > 2, 100); > > > > MostSimilarItemsCandidateItemsStrategy > > mostSimilarItemsCandidateItemsStrategy = new OptimizedItemStrategy(20, 2, > > 100); > > > > ItemBasedRecommender recommender = > > newGenericBooleanPrefItemBasedRecommender( > > dataModel, itemSimilarity, candidateItemsStrategy, > > > > mostSimilarItemsCandidateItemsStrategy); > > > > return recommender; > > > > } catch (TasteException e) { > > > > // TODO Auto-generated catch block > > > > e.printStackTrace(); > > > > return null; > > > > } > > > > } > > > > }; > > > > RecommenderIRStatsEvaluator evaluator = new > > GenericRecommenderIRStatsEvaluator(); > > > > try { > > > > IRStatistics stats = evaluator.evaluate(itemBasedBuilder, null, > > this.dataModel, null, 3, 0, 1.0); > > > > logger.info("Evalute returned:" + stats.toString()); > > > > } catch (TasteException e) { > > > > // TODO Auto-generated catch block > > > > logger.error("",e); > > > > } > > > > On Fri, Dec 2, 2011 at 1:29 PM, Daniel Zohar <[email protected]> wrote: > > > >> Hello Manuel, > >> I will run the tests as requested and post the results later. > >> > >> > >> On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt < > >> [email protected]> wrote: > >> > >>> Hello Daniel, > >>> > >>> On 02.12.2011, at 12:02, Daniel Zohar wrote: > >>> > >>>> Hi guys, > >>>> > >>>> ... > >>>> I just ran the fix I proposed earlier and I got great results! The > query > >>>> time was reduced to about a third for the 'heavy users'. Before it was > >>> 1-5 > >>>> secs and now it's 0.5-1.5. The best part is that the accuracy level > >>> should > >>>> remain exactly the same. I also believe it should reduce memory > >>>> consumption, as the GenericBooleanPrefDataModel.preferenceForItems > gets > >>>> significantly smaller (in my case at least). > >>> > >>> It would be great if you could measure your run time performance and > your > >>> accuracy with the provided Mahout tools. > >>> > >>> In your case because you only have boolean feedback precision and > recall > >>> would make sense. > >>> > >>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html > >>> > >>> RecommenderIRStatsEvaluator evaluator = new > >>> GenericRecommenderIRStatsEvaluator(); > >>> IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, > 3, > >>> RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); > >>> > >>> > >>> Here is some example code from me: > >>> > >>> public void testEvaluateRecommender() { > >>> try { > >>> DataModel myModel = new > >>> MyModelImplementationDataModel(); > >>> > >>> // Users: 12858 > >>> // Items: 5467 > >>> // MaxPreference: 85850.0 > >>> // MinPreference: 50.0 > >>> System.out.println("Users: > >>> "+myModel.getNumUsers()); > >>> System.out.println("Items: > >>> "+myModel.getNumItems()); > >>> System.out.println("MaxPreference: > >>> "+myModel.getMaxPreference()); > >>> System.out.println("MinPreference: > >>> "+myModel.getMinPreference()); > >>> > >>> RecommenderBuilder randomBased = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> RandomRecommender(model); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder genericItemBased = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> GenericItemBasedRecommender(model, > >>> new > >>> PearsonCorrelationSimilarity(model)); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder genericItemBasedCosine = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> GenericItemBasedRecommender(model, > >>> new > >>> UncenteredCosineSimilarity(model)); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder genericItemBasedLikely = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> return new > >>> GenericItemBasedRecommender(model, > >>> new > >>> LogLikelihoodSimilarity(model)); > >>> } > >>> }; > >>> > >>> > >>> RecommenderBuilder genericUserBasedNN3 = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> GenericUserBasedRecommender( > >>> model, > >>> new > >>> NearestNUserNeighborhood( > >>> > >>> 3, > >>> > >>> new PearsonCorrelationSimilarity(model), > >>> > >>> model), > >>> new > >>> PearsonCorrelationSimilarity(model)); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder genericUserBasedNN20 = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> GenericUserBasedRecommender( > >>> model, > >>> new > >>> NearestNUserNeighborhood( > >>> > >>> 20, > >>> > >>> new PearsonCorrelationSimilarity(model), > >>> > >>> model), > >>> new > >>> PearsonCorrelationSimilarity(model)); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder slopeOneBased = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> SlopeOneRecommender(model); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> RecommenderBuilder svdBased = new > >>> RecommenderBuilder() { > >>> public Recommender > >>> buildRecommender(DataModel model) { > >>> // build and return the > >>> Recommender to evaluate here > >>> try { > >>> return new > >>> SVDRecommender(model, new ALSWRFactorizer( > >>> model, > >>> 100, 0.3, 5)); > >>> } catch (TasteException e) { > >>> // TODO Auto-generated > >>> catch block > >>> e.printStackTrace(); > >>> return null; > >>> } > >>> } > >>> }; > >>> > >>> // Data Set Summary: > >>> // 12858 users > >>> // 121304 preferences > >>> > >>> RecommenderEvaluator evaluator = new > >>> AverageAbsoluteDifferenceRecommenderEvaluator(); > >>> > >>> double evaluation = > >>> evaluator.evaluate(randomBased, null, myModel, > >>> 0.9, 1.0); > >>> // Evaluation of randomBased (baseline): > >>> 43045.380570443434 > >>> // (RandomRecommender(model)) > >>> System.out.println("Evaluation of randomBased > >>> (baseline): " > >>> + evaluation); > >>> > >>> // evaluation = > >>> evaluator.evaluate(genericItemBased, null, myModel, > >>> // 0.9, 1.0); > >>> // Evaluation of ItemBased with Pearson > >>> Correlation: > >>> // 315.5804958647985 > >>> (GenericItemBasedRecommender(model, > >>> // PearsonCorrelationSimilarity(model)) > >>> // System.out > >>> // .println("Evaluation of ItemBased with Pearson > >>> Correlation: " > >>> // + evaluation); > >>> > >>> // evaluation = > >>> evaluator.evaluate(genericItemBasedCosine, null, > >>> // myModel, 0.9, 1.0); > >>> // Evaluation of ItemBase with uncentered Cosine: > >>> 198.25393235323375 > >>> // (GenericItemBasedRecommender(model, > >>> // UncenteredCosineSimilarity(model))) > >>> // System.out > >>> // .println("Evaluation of ItemBased with > >>> Uncentered Cosine: " > >>> // + evaluation); > >>> > >>> evaluation = > >>> evaluator.evaluate(genericItemBasedLikely, null, > >>> myModel, 0.9, 1.0); > >>> // Evaluation of ItemBase with log likelihood: > >>> 176.45243607278724 > >>> // (GenericItemBasedRecommender(model, > >>> // LogLikelihoodSimilarity(model))) > >>> System.out > >>> .println("Evaluation of ItemBased > >>> with LogLikelihood: " > >>> + evaluation); > >>> > >>> > >>> > >>> // User based is slow and inaccurate > >>> // evaluation = > >>> evaluator.evaluate(genericUserBasedNN3, null, > >>> // myModel, 0.9, 1.0); > >>> // Evaluation of UserBased 3 with Pearson > >>> Correlation: > >>> // 1774.9897130330407 > >>> (GenericUserBasedRecommender(model, > >>> // NearestNUserNeighborhood(3, > >>> PearsonCorrelationSimilarity(model), > >>> // model), PearsonCorrelationSimilarity(model))) > >>> // took about 2 minutes > >>> // System.out.println("Evaluation of UserBased 3 > >>> with Pearson Correlation: "+evaluation); > >>> > >>> // evaluation = > >>> evaluator.evaluate(genericUserBasedNN20, null, > >>> // myModel, 0.9, 1.0); > >>> // Evaluation of UserBased 20 with Pearson > >>> // Correlation:1329.137324225053 > >>> (GenericUserBasedRecommender(model, > >>> // NearestNUserNeighborhood(20, > >>> PearsonCorrelationSimilarity(model), > >>> // model), PearsonCorrelationSimilarity(model))) > >>> // took about 3 minutes > >>> // System.out.println("Evaluation of UserBased 20 > >>> with Pearson Correlation: "+evaluation); > >>> > >>> // evaluation = evaluator.evaluate(slopeOneBased, > >>> null, myModel, > >>> // 0.9, 1.0); > >>> // Evaluation of SlopeOne: 464.8989330869532 > >>> // (SlopeOneRecommender(model)) > >>> // System.out.println("Evaluation of SlopeOne: > >>> "+evaluation); > >>> > >>> // evaluation = evaluator.evaluate(svdBased, > null, > >>> myModel, 0.9, > >>> // 1.0); > >>> // Evaluation of SVD based: 378.9776153202042 > >>> // (ALSWRFactorizer(model, 100, 0.3, 5)) > >>> // took about 10 minutes to calculate on a Mac > >>> Book Pro > >>> // System.out.println("Evaluation of SVD based: > >>> "+evaluation); > >>> > >>> } catch (TasteException e) { > >>> // TODO Auto-generated catch block > >>> e.printStackTrace(); > >>> } > >>> > >>> } > >>> > >>>> > >>>> The fix is merely adding two lines of code to one of > >>>> the GenericBooleanPrefDataModel constructors. See > >>>> http://pastebin.com/K5PB68Et, the lines I added are #11, #22. > >>>> > >>>> The only problem I see at the moment, is that the similarities > >>>> implementations are using the num of users per item in the > >>>> item-item similarity calculation. This _can_ be mitigated by creating > an > >>>> additional Map in the DataModel which maps itemID to numUsers. > >>>> > >>>> What do you think about the proposed solution? Perhaps I am missing > some > >>>> other implications? > >>>> > >>>> Thanks! > >>>> > >>>> > >>>> On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <[email protected]> wrote: > >>>> > >>>>> (Agree, and the sampling happens at the user level now -- so if you > >>> sample > >>>>> one of these users, it slows down a lot. The spirit of the proposed > >>> change > >>>>> is to make sampling more fine-grained, at the individual item level. > >>> That > >>>>> seems to certainly fix this.) > >>>>> > >>>>> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> This may or may not help much. My guess is that the improvement > will > >>> be > >>>>>> very modest. > >>>>>> > >>>>>> The most serious problem is going to be recommendations for anybody > >>> who > >>>>> has > >>>>>> rated one of these excessively popular items. That item will bring > >>> in a > >>>>>> huge number of other users and thus a huge number of items to > >>> consider. > >>>>> If > >>>>>> you down-sample ratings of the prolific users and kill super-common > >>>>> items, > >>>>>> I think you will see much more improvement than simply eliminating > the > >>>>>> singleton users. > >>>>>> > >>>>>> The basic issue is that cooccurrence based algorithms have run-time > >>>>>> proportional to O(n_max^2) where n_max is the maximum number of > items > >>> per > >>>>>> user. > >>>>>> > >>>>>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]> > >>> wrote: > >>>>>> > >>>>>>> This is why I'm looking now into improving > >>> GenericBooleanPrefDataModel > >>>>> to > >>>>>>> not take into account users which made one interaction under the > >>>>>>> 'preferenceForItems' Map. What do you think about this approach? > >>>>>>> > >>>>>> > >>>>> > >>> > >>> -- > >>> Manuel Blechschmidt > >>> Dortustr. 57 > >>> 14467 Potsdam > >>> Mobil: 0173/6322621 > >>> Twitter: http://twitter.com/Manuel_B > >>> > >>> > >> > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B > >
