Yes this is one way to do it. I might instead advise you to wrap the ItemSimilarity in CachingItemSimilarity, and be done. It will compute only the item-item similarities you need, but then the frequently used ones will be in memory.
You can then clear the cache for particular item IDs when you know they've changed. (Although I would further suggest you don't need to invalidate the cache every time. If you already have a value based on a few data points, a next one isn't likely to change the value much. You might recompute less frequently.) Timestamp does not have any effect on any of this, no. On Tue, Jun 21, 2011 at 3:25 PM, Daniel Xiaodan Zhou <[email protected]>wrote: > Hi, > > I'd like to compute item-based similarity offline, and then > incrementally update item-similarity for new items. Can someone please > confirm that the code snippet below is the right way to do it? In > particular, I'd like to know whether I should use refresh() at all. I > think refresh() is designed for incremental update, but I don't know > how to use it in my code. Also, I read the implementation of refresh() > in DataModel, Similarity and Recommender, but I didn't see the use of > "timestamp" anywhere. Does refresh() use "timestamp" to update > similarity/prediction scores incrementally, or does it only clear the > cache? Is the "timestamp" column used at all? Thanks. > > After get answers from here, I'll document it in the Wiki page as well. > > The code snippet is also at: http://pastebin.com/yyLr14ja > > // compute item-similarity offline > def RecommenderInitialUpdateOffline() { > // first-time running the recommender > JDBCDataModel model = new MySQLJDBCDataModel(dataSource, > "ratings", "user_id", "item_id", "rating", "updated"); > ItemSimilarity itemSimilarity = new PearsonCorrelationSimilarity(model); > GenericItemBasedRecommender recommender = new > GenericItemBasedRecommender(model, itemSimilarity); > > // compute the initial item-similarity data and persistence them. > time-consuming. > LongPrimitiveIterator iterator = model.getItemIDs(); > while (iterator.hasNext()) { > long itemID1 = iterator.nextLong(); > for (RecommendedItem item : > recommender.mostSimilarItems(itemID1, maxKeep)) { > long itemID2 = item.getItemID(); > double score = item.getValue(); > // save the item-simiarity data in file or database. > saveSimilarity(itemID1, itemID2, score); > } > } > } > > // incrementally update item-similarity and compute recommendations online > def RecommenderIncrementalUpdate() { > // in incremental update, we still need to initialize these > objects to compute similarity/preference data for new users/items > JDBCDataModel model = new MySQLJDBCDataModel(dataSource, > "ratings", "user_id", "item_id", "rating", "updated"); > ItemSimilarity itemSimilarity = new PearsonCorrelationSimilarity(model); > GenericItemBasedRecommender recommender = new > GenericItemBasedRecommender(model, itemSimilarity); > > // incrementally compute and save similarity data for new items. > // existing similarity data is relatively stable, and we won't > touch them. thus we can save lots of computation time. > List<Long> newItems = loadNewItems(); > for (long itemID1 : newItems) { > for (RecommendedItem item : > recommender.mostSimilarItems(itemID1, maxKeep)) { > long itemID2 = item.getItemID(); > double score = item.getValue(); > // append the new item-simiarity data in file or database. > saveSimilarity(itemID1, itemID2, score); > } > } > > // reload all similarity data from persistent storage. > List<GenericItemSimilarity.ItemItemSimilarity> ii = loadSimilarity(); > GenericItemSimilarity reloadSimilarity = new GenericItemSimilarity(ii); > GenericItemBasedRecommender reloadRecommender = new > GenericItemBasedRecommender(model, reloadSimilarity); > // print a prediction score for user-item > println recommender.estimatePreference(user_id, item_id); > } >
