Shouldn't, yes. But for a toy dataset, it might work out.
On Fri, Apr 18, 2014 at 10:25 AM, Sebastian Schelter < [email protected]> wrote: > You can, but you shouldn't :) > > On 04/18/2014 07:23 PM, Ted Dunning wrote: > >> You can always run Hadoop in a local mode. Nothing prevents a single node >> from being a cluster. :-) >> >> >> On Thu, Apr 17, 2014 at 7:43 AM, Najum Ali <[email protected]> >> wrote: >> >> Ted, >>> >>> Is it also possible to use ItemSimilarityJob in a non-distributed >>> environment? >>> >>> Am 17.04.2014 um 16:22 schrieb Ted Dunning <[email protected]>: >>> >>> Najum, >>>> >>>> You should also be able to use the ItemSimilarityJob to compute a >>>> limited >>>> indicator set. >>>> >>>> This is stepping off of the path you have been on, but it would allow >>>> you >>>> to deploy the recommender via a search engine. >>>> >>>> That makes a lot of code simply vanish. THis is also a well trod >>>> production path. >>>> >>>> >>>> >>>> >>>> On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <[email protected]> >>>> >>> wrote: >>> >>>> >>>> @Sebastian >>>>> >>>>> wow … you are right. The original csv file is about 21mb and the >>>>> corresponding precomputed item-item similarity file is about 260mb!! >>>>> And yes, there are wide more than 50 "most similar items“ for an item >>>>> .. >>>>> >>>>> Trying to restrict this to 50 (or something like that) most similar >>>>> >>>> items >>> >>>> for an item could do the trick as you said. >>>>> Ok I will give it try and reply later. >>>>> >>>>> By the way, what´s about the SampingCandidateItemsStrategy or something >>>>> like this, by using this Constructor: >>>>> *GenericItemBasedRecommender >>>>> < >>>>> >>>> https://builds.apache.org/job/mahout-quality/javadoc/org/ >>> apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender. >>> html#GenericItemBasedRecommender(org.apache.mahout.cf.taste. >>> model.DataModel,%20org.apache.mahout.cf.taste.similarity. >>> ItemSimilarity,%20org.apache.mahout.cf.taste.recommender. >>> CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender. >>> MostSimilarItemsCandidateItemsStrategy) >>> >>>> * >>>> >>>>> (DataModel< >>>>> >>>> https://builds.apache.org/job/mahout-quality/javadoc/org/ >>> apache/mahout/cf/taste/model/DataModel.html >>> >>>> >>>> dataModel, ItemSimilarity< >>>>> >>>> https://builds.apache.org/job/mahout-quality/javadoc/org/ >>> apache/mahout/cf/taste/similarity/ItemSimilarity.html >>> >>>> >>>> similarity, CandidateItemsStrategy< >>>>> >>>> https://builds.apache.org/job/mahout-quality/javadoc/org/ >>> apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html >>> >>>> >>>> candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy< >>>>> >>>> https://builds.apache.org/job/mahout-quality/javadoc/org/ >>> apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItems >>> Strategy.html >>> >>>> >>>> mostSimilarItemsCandidateItemsStrategy) >>>>> >>>>> >>>>> Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>: >>>>> >>>>> Hi Najum, >>>>> >>>>> I think I found the problem. Remember: Two items are similar whenever >>>>> at >>>>> least one user interacted with both of them ("the items co-occur"). >>>>> >>>>> In the movielens dataset this is true for almost all pairs of items, >>>>> unfortunately. From 3076 items, more than 11 million similarities are >>>>> created. A common approach for that (which is not yet implemented in >>>>> our >>>>> precomputation unfortunately) is to only retain the top-k similar items >>>>> >>>> per >>> >>>> item. >>>>> >>>>> A solution would be to take the csv file that is created by the >>>>> MultithreadedBatchItemSimilarities and postprocess it so that only the >>>>> >>>> 50 >>> >>>> most similar items per item are retained. That should help with your >>>>> problem. >>>>> >>>>> Unfortunately, we don't have code for that yet, maybe you want to try >>>>> to >>>>> write that yourself? >>>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> PS: The user-based recommender restricts the number of similar users, I >>>>> guess thats why it is so fast here. >>>>> >>>>> >>>>> On 04/17/2014 12:18 PM, Najum Ali wrote: >>>>> >>>>> Ok, here you go: >>>>> >>>>> I have created a simple class with main-method (no server and other >>>>> >>>> stuff): >>> >>>> >>>>> public class RecommenderTest { >>>>> public static void main(String[] args) throws IOException, >>>>> >>>> TasteException { >>> >>>> DataModel dataModel = new FileDataModel(new >>>>> >>>>> >>>>> File("/Users/najum/Documents/recommender-console/src/main/ >>> webapp/resources/preference_csv/1mil.csv")); >>> >>>> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel); >>>>> ItemBasedRecommender recommender = new >>>>> GenericItemBasedRecommender(dataModel, >>>>> similarity); >>>>> >>>>> String pathToPreComputedFile = preComputeSimilarities(recommender, >>>>> dataModel.getNumItems()); >>>>> >>>>> InputStream inputStream = new FileInputStream(new >>>>> File(pathToPreComputedFile)); >>>>> BufferedReader bufferedReader = new BufferedReader(new >>>>> InputStreamReader(inputStream)); >>>>> Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = >>>>> >>>>> >>>>> bufferedReader.lines().map(mapToItemItemSimilarity). >>> collect(Collectors.toList()); >>> >>>> ItemSimilarity precomputedSimilarity = new >>>>> GenericItemSimilarity(correlations); >>>>> ItemBasedRecommender recommenderWithPrecomputation = new >>>>> GenericItemBasedRecommender(dataModel, precomputedSimilarity); >>>>> >>>>> recommend(recommender); >>>>> recommend(recommenderWithPrecomputation); >>>>> } >>>>> >>>>> private static String preComputeSimilarities(ItemBasedRecommender >>>>> recommender, >>>>> int simItemsPerItem) throws TasteException { >>>>> String pathToAbsolutePath = ""; >>>>> try { >>>>> File resultFile = new File(System.getProperty("java.io.tmpdir"), >>>>> "similarities.csv"); >>>>> if (resultFile.exists()) { >>>>> resultFile.delete(); >>>>> } >>>>> BatchItemSimilarities batchJob = new >>>>> MultithreadedBatchItemSimilarities(recommender, simItemsPerItem); >>>>> int numSimilarities = >>>>> >>>>> batchJob.computeItemSimilarities(Runtime.getRuntime(). >>> availableProcessors(), >>> >>>> 1, >>>>> new FileSimilarItemsWriter(resultFile)); >>>>> pathToAbsolutePath = resultFile.getAbsolutePath(); >>>>> System.out.println("Computed " + numSimilarities + " similarities and >>>>> saved them >>>>> to " + pathToAbsolutePath); >>>>> } catch (IOException e) { >>>>> System.out.println("Error while writing pre computed similarities to >>>>> file"); >>>>> } >>>>> return pathToAbsolutePath; >>>>> } >>>>> >>>>> private static void recommend(ItemBasedRecommender recommender) throws >>>>> TasteException { >>>>> long start = System.nanoTime(); >>>>> List<RecommendedItem> recommendations = recommender.recommend(1, 10); >>>>> long end = System.nanoTime(); >>>>> System.out.println("Created recommendations in " + >>>>> getCalculationTimeInMilliseconds(start, end) + " ms. >>>>> Recommendations:" + >>>>> recommendations); >>>>> } >>>>> >>>>> private static double getCalculationTimeInMilliseconds(long start, >>>>> long >>>>> end) { >>>>> double calculationTime = (end - start); >>>>> return (calculationTime / 1_000_000); >>>>> } >>>>> >>>>> >>>>> private static Function<String, >>>>> >>>> GenericItemSimilarity.ItemItemSimilarity> >>> >>>> mapToItemItemSimilarity = (line) -> { >>>>> String[] row = line.split(","); >>>>> return new GenericItemSimilarity.ItemItemSimilarity( >>>>> Long.parseLong(row[0]), Long.parseLong(row[1]), >>>>> Double.parseDouble(row[2])); >>>>> }; >>>>> } >>>>> >>>>> And thats the Output-log: >>>>> >>>>> 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel >>>>> - >>>>> Creating FileDataModel for file >>>>> >>>>> >>>>> /Users/najum/Documents/recommender-console/src/main/ >>> webapp/resources/preference_csv/1mil.csv >>> >>>> 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel >>>>> >>>> - >>> >>>> Reading file info... >>>>> 1207 [main] INFO >>>>> >>>> org.apache.mahout.cf.taste.impl.model.file.FileDataModel - >>> >>>> Processed 1000000 lines >>>>> 1208 [main] INFO >>>>> >>>> org.apache.mahout.cf.taste.impl.model.file.FileDataModel >>> >>>> - Read >>>>> lines: 1000209 >>>>> 1475 [main] INFO org.apache.mahout.cf.taste. >>>>> impl.model.GenericDataModel >>>>> >>>> - >>> >>>> Processed 6040 users >>>>> 1599 [main] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - Queued 3706 items in 38 batches >>>>> 10928 [pool-1-thread-8] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 7 processed 5 batches >>>>> 10928 [pool-1-thread-8] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 7 processed 5 batches. done. >>>>> 10978 [pool-1-thread-5] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 4 processed 4 batches. done. >>>>> 11589 [pool-1-thread-4] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 3 processed 5 batches >>>>> 11589 [pool-1-thread-4] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 3 processed 5 batches. done. >>>>> 11592 [pool-1-thread-6] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 5 processed 5 batches >>>>> 11592 [pool-1-thread-6] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 5 processed 5 batches. done. >>>>> 11707 [pool-1-thread-7] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 6 processed 5 batches >>>>> 11707 [pool-1-thread-7] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 6 processed 5 batches. done. >>>>> 11730 [pool-1-thread-3] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 2 processed 4 batches. done. >>>>> 11849 [pool-1-thread-1] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 0 processed 5 batches >>>>> 11849 [pool-1-thread-1] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 0 processed 5 batches. done. >>>>> 11854 [pool-1-thread-2] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 1 processed 5 batches >>>>> 11854 [pool-1-thread-2] INFO >>>>> >>>>> >>>>> org.apache.mahout.cf.taste.impl.similarity.precompute. >>> MultithreadedBatchItemSimilarities >>> >>>> - worker 1 processed 5 batches. done. >>>>> Computed 9174333 similarities and saved them to >>>>> /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv >>>>> Created recommendations in *1683.613 >>>>> ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617], >>>>> RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127, >>>>> value:4.660716], RecommendedItem[item:3323, value:4.660716], >>>>> RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123, >>>>> value:4.603366], RecommendedItem[item:3233, value:4.5707765], >>>>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, >>>>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]] >>>>> Created recommendations in* 985.679 >>>>> ms.* Recommendations:[RecommendedItem[item:3530, value:5.0], >>>>> RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890, >>>>> value:4.6771617], RecommendedItem[item:127, value:4.660716], >>>>> RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123, >>>>> value:4.603366], RecommendedItem[item:3233, value:4.5707765], >>>>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, >>>>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]] >>>>> >>>>> Again almost same results. Although what I also don´t understand is, >>>>> why >>>>> am I >>>>> getting different RecommendItems? >>>>> That really frustrates me… >>>>> >>>>> You can find the Java file in the attachment. >>>>> >>>>> >>>>> >>>>> Greetings from Germany, >>>>> Najum >>>>> >>>>> Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected] >>>>> <mailto:[email protected] <[email protected]>>>: >>>>> >>>>> Yes, just to make sure the problem is in the mahout code and not in the >>>>> surrounding environment. >>>>> >>>>> On 04/17/2014 11:43 AM, Najum Ali wrote: >>>>> >>>>> @Sebastian >>>>> What do u mean with a standalone recommender? A simple offline java >>>>> main >>>>> program? >>>>> >>>>> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected] >>>>> <mailto:[email protected] <[email protected]>>>: >>>>> >>>>> Could you take the output of the precomputation, feed it into a >>>>> >>>> standalone >>> >>>> recommender and test it there? >>>>> >>>>> >>>>> On 04/17/2014 11:37 AM, Najum Ali wrote: >>>>> >>>>> @sebastian >>>>> >>>>> Are you sure that the precomputation is done only once and not in every >>>>> request? >>>>> >>>>> Yes, a @Bean annotated Object is in Spring per default a singleton >>>>> instance. >>>>> I also just tested it out using a System.out.println() >>>>> Here is my log: >>>>> >>>>> System.out.println("----> precomputation done!“ is called before >>>>> >>>> returning >>> >>>> the >>>>> GenericItemSimilarity. >>>>> >>>>> The first two recommendations are Item-based -> pearson similarity >>>>> The thrid and 4th log are also item-based using pre computed similarity >>>>> The last log is the userbased recommender using pearson >>>>> >>>>> Look at the huge time difference! >>>>> >>>>> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected] >>>>> <mailto:[email protected] <[email protected]>> >>>>> <mailto:[email protected] <[email protected]>>>: >>>>> >>>>> Najum, >>>>> >>>>> this is really strange, feeding an ItemBased Recommender with >>>>> >>>> precomputed >>> >>>> similarities should give you superfast recommendations. >>>>> >>>>> Are you sure that the precomputation is done only once and not in every >>>>> request? >>>>> >>>>> --sebastian >>>>> >>>>> On 04/17/2014 11:17 AM, Najum Ali wrote: >>>>> >>>>> Hi guys, >>>>> >>>>> I have created a precomputed item-item-similarity collection for a >>>>> GenericItemBasedRecommender. >>>>> Using the 1M MovieLens data, my item-based recommender is only 40-50% >>>>> faster >>>>> than without precomputation (like 589.5ms instead 1222.9ms). >>>>> But the user-based recommender instead is really fast, it´s like >>>>> 24.2ms? >>>>> How can >>>>> this happen? >>>>> >>>>> Here are more details to my Implementation: >>>>> >>>>> CSV File: 1M pref, 6040 Users, 3706 Items >>>>> >>>>> For my Implementation I´m using screenshots, because having the good >>>>> highlighting. >>>>> My Recommender runs inside a Webserver (Jetty) using Spring 4 and >>>>> >>>> Java8. I >>> >>>> receive Recommendations as Webservice (JSON). >>>>> >>>>> For DataModel, I´m using FileDataModel. >>>>> >>>>> >>>>> This code below creates me a precomputed ItemSimilarity when I start >>>>> the >>>>> Webserver and the property isItemPreComputationEnabled is set to true: >>>>> >>>>> >>>>> For time measuring I´m using AOP. I´m measuring the whole time from >>>>> entering my >>>>> Controller to sending the response. >>>>> based on System.nanoTime(); and getting the diff. It´s the same time >>>>> measure for >>>>> user based. >>>>> >>>>> I haved tried to cache the recommender and the similarity with no big >>>>> difference. I also tried to use CandidateItemsStrategy and >>>>> MostSimilarItemsCandidateItemsStrategy, but also no performance boost. >>>>> >>>>> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity >>>>> similarity) >>>>> throws TasteException { >>>>> final int numberOfUsers = dataModel.getNumUsers(); >>>>> final int numberOfItems = dataModel.getNumItems(); >>>>> CandidateItemsStrategy candidateItemsStrategy = new >>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); >>>>> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new >>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); >>>>> return model -> new GenericItemBasedRecommender(model, >>>>> similarity,candidateItemsStrategy,mostSimilarStrategy); >>>>> } >>>>> >>>>> I dont know why item-based is taking so much longer then user-based. >>>>> User-based >>>>> is like fast as hell. I even tried a DataSet using 100k Prefs, and >>>>> 10Million >>>>> (Movielens). Everytime the user-based is soo much faster for any >>>>> similarity. >>>>> >>>>> Hope you anyone can help me to understand this. Maybe I´m doing >>>>> >>>> something >>> >>>> wrong. >>>>> >>>>> Thanks!! :)) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >> >
