Najum, You should also be able to use the ItemSimilarityJob to compute a limited indicator set.
This is stepping off of the path you have been on, but it would allow you to deploy the recommender via a search engine. That makes a lot of code simply vanish. THis is also a well trod production path. On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <[email protected]> wrote: > @Sebastian > > wow … you are right. The original csv file is about 21mb and the > corresponding precomputed item-item similarity file is about 260mb!! > And yes, there are wide more than 50 "most similar items“ for an item .. > > Trying to restrict this to 50 (or something like that) most similar items > for an item could do the trick as you said. > Ok I will give it try and reply later. > > By the way, what´s about the SampingCandidateItemsStrategy or something > like this, by using this Constructor: > *GenericItemBasedRecommender > <https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.html#GenericItemBasedRecommender(org.apache.mahout.cf.taste.model.DataModel,%20org.apache.mahout.cf.taste.similarity.ItemSimilarity,%20org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender.MostSimilarItemsCandidateItemsStrategy)>* > (DataModel<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html> > dataModel, > ItemSimilarity<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/similarity/ItemSimilarity.html> > similarity, > CandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html> > > candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html> > mostSimilarItemsCandidateItemsStrategy) > > > Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>: > > Hi Najum, > > I think I found the problem. Remember: Two items are similar whenever at > least one user interacted with both of them ("the items co-occur"). > > In the movielens dataset this is true for almost all pairs of items, > unfortunately. From 3076 items, more than 11 million similarities are > created. A common approach for that (which is not yet implemented in our > precomputation unfortunately) is to only retain the top-k similar items per > item. > > A solution would be to take the csv file that is created by the > MultithreadedBatchItemSimilarities and postprocess it so that only the 50 > most similar items per item are retained. That should help with your > problem. > > Unfortunately, we don't have code for that yet, maybe you want to try to > write that yourself? > > Best, > Sebastian > > PS: The user-based recommender restricts the number of similar users, I > guess thats why it is so fast here. > > > On 04/17/2014 12:18 PM, Najum Ali wrote: > > Ok, here you go: > > I have created a simple class with main-method (no server and other stuff): > > public class RecommenderTest { > public static void main(String[] args) throws IOException, TasteException { > DataModel dataModel = new FileDataModel(new > > File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv")); > ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel); > ItemBasedRecommender recommender = new > GenericItemBasedRecommender(dataModel, > similarity); > > String pathToPreComputedFile = preComputeSimilarities(recommender, > dataModel.getNumItems()); > > InputStream inputStream = new FileInputStream(new > File(pathToPreComputedFile)); > BufferedReader bufferedReader = new BufferedReader(new > InputStreamReader(inputStream)); > Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = > > bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList()); > ItemSimilarity precomputedSimilarity = new > GenericItemSimilarity(correlations); > ItemBasedRecommender recommenderWithPrecomputation = new > GenericItemBasedRecommender(dataModel, precomputedSimilarity); > > recommend(recommender); > recommend(recommenderWithPrecomputation); > } > > private static String preComputeSimilarities(ItemBasedRecommender > recommender, > int simItemsPerItem) throws TasteException { > String pathToAbsolutePath = ""; > try { > File resultFile = new File(System.getProperty("java.io.tmpdir"), > "similarities.csv"); > if (resultFile.exists()) { > resultFile.delete(); > } > BatchItemSimilarities batchJob = new > MultithreadedBatchItemSimilarities(recommender, simItemsPerItem); > int numSimilarities = > batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), > 1, > new FileSimilarItemsWriter(resultFile)); > pathToAbsolutePath = resultFile.getAbsolutePath(); > System.out.println("Computed " + numSimilarities + " similarities and > saved them > to " + pathToAbsolutePath); > } catch (IOException e) { > System.out.println("Error while writing pre computed similarities to > file"); > } > return pathToAbsolutePath; > } > > private static void recommend(ItemBasedRecommender recommender) throws > TasteException { > long start = System.nanoTime(); > List<RecommendedItem> recommendations = recommender.recommend(1, 10); > long end = System.nanoTime(); > System.out.println("Created recommendations in " + > getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" + > recommendations); > } > > private static double getCalculationTimeInMilliseconds(long start, long > end) { > double calculationTime = (end - start); > return (calculationTime / 1_000_000); > } > > > private static Function<String, GenericItemSimilarity.ItemItemSimilarity> > mapToItemItemSimilarity = (line) -> { > String[] row = line.split(","); > return new GenericItemSimilarity.ItemItemSimilarity( > Long.parseLong(row[0]), Long.parseLong(row[1]), > Double.parseDouble(row[2])); > }; > } > > And thats the Output-log: > > 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - > Creating FileDataModel for file > > /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv > 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - > Reading file info... > 1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - > Processed 1000000 lines > 1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel > - Read > lines: 1000209 > 1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - > Processed 6040 users > 1599 [main] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - Queued 3706 items in 38 batches > 10928 [pool-1-thread-8] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 7 processed 5 batches > 10928 [pool-1-thread-8] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 7 processed 5 batches. done. > 10978 [pool-1-thread-5] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 4 processed 4 batches. done. > 11589 [pool-1-thread-4] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 3 processed 5 batches > 11589 [pool-1-thread-4] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 3 processed 5 batches. done. > 11592 [pool-1-thread-6] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 5 processed 5 batches > 11592 [pool-1-thread-6] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 5 processed 5 batches. done. > 11707 [pool-1-thread-7] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 6 processed 5 batches > 11707 [pool-1-thread-7] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 6 processed 5 batches. done. > 11730 [pool-1-thread-3] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 2 processed 4 batches. done. > 11849 [pool-1-thread-1] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 0 processed 5 batches > 11849 [pool-1-thread-1] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 0 processed 5 batches. done. > 11854 [pool-1-thread-2] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 1 processed 5 batches > 11854 [pool-1-thread-2] INFO > > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > - worker 1 processed 5 batches. done. > Computed 9174333 similarities and saved them to > /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv > Created recommendations in *1683.613 > ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617], > RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127, > value:4.660716], RecommendedItem[item:3323, value:4.660716], > RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123, > value:4.603366], RecommendedItem[item:3233, value:4.5707765], > RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, > value:4.5263577], RecommendedItem[item:2343, value:4.524066]] > Created recommendations in* 985.679 > ms.* Recommendations:[RecommendedItem[item:3530, value:5.0], > RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890, > value:4.6771617], RecommendedItem[item:127, value:4.660716], > RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123, > value:4.603366], RecommendedItem[item:3233, value:4.5707765], > RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, > value:4.5263577], RecommendedItem[item:2343, value:4.524066]] > > Again almost same results. Although what I also don´t understand is, why > am I > getting different RecommendItems? > That really frustrates me… > > You can find the Java file in the attachment. > > > > Greetings from Germany, > Najum > > Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected] > <mailto:[email protected] <[email protected]>>>: > > Yes, just to make sure the problem is in the mahout code and not in the > surrounding environment. > > On 04/17/2014 11:43 AM, Najum Ali wrote: > > @Sebastian > What do u mean with a standalone recommender? A simple offline java main > program? > > Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected] > <mailto:[email protected] <[email protected]>>>: > > Could you take the output of the precomputation, feed it into a standalone > recommender and test it there? > > > On 04/17/2014 11:37 AM, Najum Ali wrote: > > @sebastian > > Are you sure that the precomputation is done only once and not in every > request? > > Yes, a @Bean annotated Object is in Spring per default a singleton > instance. > I also just tested it out using a System.out.println() > Here is my log: > > System.out.println("----> precomputation done!“ is called before returning > the > GenericItemSimilarity. > > The first two recommendations are Item-based -> pearson similarity > The thrid and 4th log are also item-based using pre computed similarity > The last log is the userbased recommender using pearson > > Look at the huge time difference! > > Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected] > <mailto:[email protected] <[email protected]>> > <mailto:[email protected] <[email protected]>>>: > > Najum, > > this is really strange, feeding an ItemBased Recommender with precomputed > similarities should give you superfast recommendations. > > Are you sure that the precomputation is done only once and not in every > request? > > --sebastian > > On 04/17/2014 11:17 AM, Najum Ali wrote: > > Hi guys, > > I have created a precomputed item-item-similarity collection for a > GenericItemBasedRecommender. > Using the 1M MovieLens data, my item-based recommender is only 40-50% > faster > than without precomputation (like 589.5ms instead 1222.9ms). > But the user-based recommender instead is really fast, it´s like 24.2ms? > How can > this happen? > > Here are more details to my Implementation: > > CSV File: 1M pref, 6040 Users, 3706 Items > > For my Implementation I´m using screenshots, because having the good > highlighting. > My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I > receive Recommendations as Webservice (JSON). > > For DataModel, I´m using FileDataModel. > > > This code below creates me a precomputed ItemSimilarity when I start the > Webserver and the property isItemPreComputationEnabled is set to true: > > > For time measuring I´m using AOP. I´m measuring the whole time from > entering my > Controller to sending the response. > based on System.nanoTime(); and getting the diff. It´s the same time > measure for > user based. > > I haved tried to cache the recommender and the similarity with no big > difference. I also tried to use CandidateItemsStrategy and > MostSimilarItemsCandidateItemsStrategy, but also no performance boost. > > public RecommenderBuilder createRecommenderBuilder(ItemSimilarity > similarity) > throws TasteException { > final int numberOfUsers = dataModel.getNumUsers(); > final int numberOfItems = dataModel.getNumItems(); > CandidateItemsStrategy candidateItemsStrategy = new > SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); > MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new > SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); > return model -> new GenericItemBasedRecommender(model, > similarity,candidateItemsStrategy,mostSimilarStrategy); > } > > I dont know why item-based is taking so much longer then user-based. > User-based > is like fast as hell. I even tried a DataSet using 100k Prefs, and > 10Million > (Movielens). Everytime the user-based is soo much faster for any > similarity. > > Hope you anyone can help me to understand this. Maybe I´m doing something > wrong. > > Thanks!! :)) > > > > > > > >
