You can always run Hadoop in a local mode. Nothing prevents a single node from being a cluster. :-)
On Thu, Apr 17, 2014 at 7:43 AM, Najum Ali <[email protected]> wrote: > Ted, > > Is it also possible to use ItemSimilarityJob in a non-distributed > environment? > > Am 17.04.2014 um 16:22 schrieb Ted Dunning <[email protected]>: > > > Najum, > > > > You should also be able to use the ItemSimilarityJob to compute a limited > > indicator set. > > > > This is stepping off of the path you have been on, but it would allow you > > to deploy the recommender via a search engine. > > > > That makes a lot of code simply vanish. THis is also a well trod > > production path. > > > > > > > > > > On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <[email protected]> > wrote: > > > >> @Sebastian > >> > >> wow … you are right. The original csv file is about 21mb and the > >> corresponding precomputed item-item similarity file is about 260mb!! > >> And yes, there are wide more than 50 "most similar items“ for an item .. > >> > >> Trying to restrict this to 50 (or something like that) most similar > items > >> for an item could do the trick as you said. > >> Ok I will give it try and reply later. > >> > >> By the way, what´s about the SampingCandidateItemsStrategy or something > >> like this, by using this Constructor: > >> *GenericItemBasedRecommender > >> < > https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.html#GenericItemBasedRecommender(org.apache.mahout.cf.taste.model.DataModel,%20org.apache.mahout.cf.taste.similarity.ItemSimilarity,%20org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender.MostSimilarItemsCandidateItemsStrategy) > >* > >> (DataModel< > https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html > > > >> dataModel, ItemSimilarity< > https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/similarity/ItemSimilarity.html > > > >> similarity, CandidateItemsStrategy< > https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html > > > >> candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy< > https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html > > > >> mostSimilarItemsCandidateItemsStrategy) > >> > >> > >> Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>: > >> > >> Hi Najum, > >> > >> I think I found the problem. Remember: Two items are similar whenever at > >> least one user interacted with both of them ("the items co-occur"). > >> > >> In the movielens dataset this is true for almost all pairs of items, > >> unfortunately. From 3076 items, more than 11 million similarities are > >> created. A common approach for that (which is not yet implemented in our > >> precomputation unfortunately) is to only retain the top-k similar items > per > >> item. > >> > >> A solution would be to take the csv file that is created by the > >> MultithreadedBatchItemSimilarities and postprocess it so that only the > 50 > >> most similar items per item are retained. That should help with your > >> problem. > >> > >> Unfortunately, we don't have code for that yet, maybe you want to try to > >> write that yourself? > >> > >> Best, > >> Sebastian > >> > >> PS: The user-based recommender restricts the number of similar users, I > >> guess thats why it is so fast here. > >> > >> > >> On 04/17/2014 12:18 PM, Najum Ali wrote: > >> > >> Ok, here you go: > >> > >> I have created a simple class with main-method (no server and other > stuff): > >> > >> public class RecommenderTest { > >> public static void main(String[] args) throws IOException, > TasteException { > >> DataModel dataModel = new FileDataModel(new > >> > >> > File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv")); > >> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel); > >> ItemBasedRecommender recommender = new > >> GenericItemBasedRecommender(dataModel, > >> similarity); > >> > >> String pathToPreComputedFile = preComputeSimilarities(recommender, > >> dataModel.getNumItems()); > >> > >> InputStream inputStream = new FileInputStream(new > >> File(pathToPreComputedFile)); > >> BufferedReader bufferedReader = new BufferedReader(new > >> InputStreamReader(inputStream)); > >> Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = > >> > >> > bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList()); > >> ItemSimilarity precomputedSimilarity = new > >> GenericItemSimilarity(correlations); > >> ItemBasedRecommender recommenderWithPrecomputation = new > >> GenericItemBasedRecommender(dataModel, precomputedSimilarity); > >> > >> recommend(recommender); > >> recommend(recommenderWithPrecomputation); > >> } > >> > >> private static String preComputeSimilarities(ItemBasedRecommender > >> recommender, > >> int simItemsPerItem) throws TasteException { > >> String pathToAbsolutePath = ""; > >> try { > >> File resultFile = new File(System.getProperty("java.io.tmpdir"), > >> "similarities.csv"); > >> if (resultFile.exists()) { > >> resultFile.delete(); > >> } > >> BatchItemSimilarities batchJob = new > >> MultithreadedBatchItemSimilarities(recommender, simItemsPerItem); > >> int numSimilarities = > >> > batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), > >> 1, > >> new FileSimilarItemsWriter(resultFile)); > >> pathToAbsolutePath = resultFile.getAbsolutePath(); > >> System.out.println("Computed " + numSimilarities + " similarities and > >> saved them > >> to " + pathToAbsolutePath); > >> } catch (IOException e) { > >> System.out.println("Error while writing pre computed similarities to > >> file"); > >> } > >> return pathToAbsolutePath; > >> } > >> > >> private static void recommend(ItemBasedRecommender recommender) throws > >> TasteException { > >> long start = System.nanoTime(); > >> List<RecommendedItem> recommendations = recommender.recommend(1, 10); > >> long end = System.nanoTime(); > >> System.out.println("Created recommendations in " + > >> getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" + > >> recommendations); > >> } > >> > >> private static double getCalculationTimeInMilliseconds(long start, long > >> end) { > >> double calculationTime = (end - start); > >> return (calculationTime / 1_000_000); > >> } > >> > >> > >> private static Function<String, > GenericItemSimilarity.ItemItemSimilarity> > >> mapToItemItemSimilarity = (line) -> { > >> String[] row = line.split(","); > >> return new GenericItemSimilarity.ItemItemSimilarity( > >> Long.parseLong(row[0]), Long.parseLong(row[1]), > >> Double.parseDouble(row[2])); > >> }; > >> } > >> > >> And thats the Output-log: > >> > >> 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - > >> Creating FileDataModel for file > >> > >> > /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv > >> 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel > - > >> Reading file info... > >> 1207 [main] INFO > org.apache.mahout.cf.taste.impl.model.file.FileDataModel - > >> Processed 1000000 lines > >> 1208 [main] INFO > org.apache.mahout.cf.taste.impl.model.file.FileDataModel > >> - Read > >> lines: 1000209 > >> 1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel > - > >> Processed 6040 users > >> 1599 [main] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - Queued 3706 items in 38 batches > >> 10928 [pool-1-thread-8] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 7 processed 5 batches > >> 10928 [pool-1-thread-8] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 7 processed 5 batches. done. > >> 10978 [pool-1-thread-5] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 4 processed 4 batches. done. > >> 11589 [pool-1-thread-4] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 3 processed 5 batches > >> 11589 [pool-1-thread-4] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 3 processed 5 batches. done. > >> 11592 [pool-1-thread-6] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 5 processed 5 batches > >> 11592 [pool-1-thread-6] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 5 processed 5 batches. done. > >> 11707 [pool-1-thread-7] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 6 processed 5 batches > >> 11707 [pool-1-thread-7] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 6 processed 5 batches. done. > >> 11730 [pool-1-thread-3] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 2 processed 4 batches. done. > >> 11849 [pool-1-thread-1] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 0 processed 5 batches > >> 11849 [pool-1-thread-1] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 0 processed 5 batches. done. > >> 11854 [pool-1-thread-2] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 1 processed 5 batches > >> 11854 [pool-1-thread-2] INFO > >> > >> > org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > >> - worker 1 processed 5 batches. done. > >> Computed 9174333 similarities and saved them to > >> /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv > >> Created recommendations in *1683.613 > >> ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617], > >> RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127, > >> value:4.660716], RecommendedItem[item:3323, value:4.660716], > >> RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123, > >> value:4.603366], RecommendedItem[item:3233, value:4.5707765], > >> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, > >> value:4.5263577], RecommendedItem[item:2343, value:4.524066]] > >> Created recommendations in* 985.679 > >> ms.* Recommendations:[RecommendedItem[item:3530, value:5.0], > >> RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890, > >> value:4.6771617], RecommendedItem[item:127, value:4.660716], > >> RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123, > >> value:4.603366], RecommendedItem[item:3233, value:4.5707765], > >> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, > >> value:4.5263577], RecommendedItem[item:2343, value:4.524066]] > >> > >> Again almost same results. Although what I also don´t understand is, why > >> am I > >> getting different RecommendItems? > >> That really frustrates me… > >> > >> You can find the Java file in the attachment. > >> > >> > >> > >> Greetings from Germany, > >> Najum > >> > >> Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected] > >> <mailto:[email protected] <[email protected]>>>: > >> > >> Yes, just to make sure the problem is in the mahout code and not in the > >> surrounding environment. > >> > >> On 04/17/2014 11:43 AM, Najum Ali wrote: > >> > >> @Sebastian > >> What do u mean with a standalone recommender? A simple offline java main > >> program? > >> > >> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected] > >> <mailto:[email protected] <[email protected]>>>: > >> > >> Could you take the output of the precomputation, feed it into a > standalone > >> recommender and test it there? > >> > >> > >> On 04/17/2014 11:37 AM, Najum Ali wrote: > >> > >> @sebastian > >> > >> Are you sure that the precomputation is done only once and not in every > >> request? > >> > >> Yes, a @Bean annotated Object is in Spring per default a singleton > >> instance. > >> I also just tested it out using a System.out.println() > >> Here is my log: > >> > >> System.out.println("----> precomputation done!“ is called before > returning > >> the > >> GenericItemSimilarity. > >> > >> The first two recommendations are Item-based -> pearson similarity > >> The thrid and 4th log are also item-based using pre computed similarity > >> The last log is the userbased recommender using pearson > >> > >> Look at the huge time difference! > >> > >> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected] > >> <mailto:[email protected] <[email protected]>> > >> <mailto:[email protected] <[email protected]>>>: > >> > >> Najum, > >> > >> this is really strange, feeding an ItemBased Recommender with > precomputed > >> similarities should give you superfast recommendations. > >> > >> Are you sure that the precomputation is done only once and not in every > >> request? > >> > >> --sebastian > >> > >> On 04/17/2014 11:17 AM, Najum Ali wrote: > >> > >> Hi guys, > >> > >> I have created a precomputed item-item-similarity collection for a > >> GenericItemBasedRecommender. > >> Using the 1M MovieLens data, my item-based recommender is only 40-50% > >> faster > >> than without precomputation (like 589.5ms instead 1222.9ms). > >> But the user-based recommender instead is really fast, it´s like 24.2ms? > >> How can > >> this happen? > >> > >> Here are more details to my Implementation: > >> > >> CSV File: 1M pref, 6040 Users, 3706 Items > >> > >> For my Implementation I´m using screenshots, because having the good > >> highlighting. > >> My Recommender runs inside a Webserver (Jetty) using Spring 4 and > Java8. I > >> receive Recommendations as Webservice (JSON). > >> > >> For DataModel, I´m using FileDataModel. > >> > >> > >> This code below creates me a precomputed ItemSimilarity when I start the > >> Webserver and the property isItemPreComputationEnabled is set to true: > >> > >> > >> For time measuring I´m using AOP. I´m measuring the whole time from > >> entering my > >> Controller to sending the response. > >> based on System.nanoTime(); and getting the diff. It´s the same time > >> measure for > >> user based. > >> > >> I haved tried to cache the recommender and the similarity with no big > >> difference. I also tried to use CandidateItemsStrategy and > >> MostSimilarItemsCandidateItemsStrategy, but also no performance boost. > >> > >> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity > >> similarity) > >> throws TasteException { > >> final int numberOfUsers = dataModel.getNumUsers(); > >> final int numberOfItems = dataModel.getNumItems(); > >> CandidateItemsStrategy candidateItemsStrategy = new > >> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); > >> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new > >> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems); > >> return model -> new GenericItemBasedRecommender(model, > >> similarity,candidateItemsStrategy,mostSimilarStrategy); > >> } > >> > >> I dont know why item-based is taking so much longer then user-based. > >> User-based > >> is like fast as hell. I even tried a DataSet using 100k Prefs, and > >> 10Million > >> (Movielens). Everytime the user-based is soo much faster for any > >> similarity. > >> > >> Hope you anyone can help me to understand this. Maybe I´m doing > something > >> wrong. > >> > >> Thanks!! :)) > >> > >> > >> > >> > >> > >> > >> > >> > >
