Mahout’s Hadoop ALS recommender from the CLI calculates all recs for all users. Spark’s MLlib will take a user’s history and return recs, so account for new events.
Mahout is now suggesting the use of a Multimodal Cooccurrence based recommender, which uses Spark to calculate a model then you index the model in a search engine where user history will be the query. So this also support realtime user actions and recs for users not in the training data. See references here: http://mahout.apache.org/users/algorithms/recommender-overview.html http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ On Jul 2, 2015, at 5:29 AM, Ramiz <[email protected]> wrote: Hi, I was wondering how can we handle new data on existing users .Do we have to run Matrix Factorization job on all users and items again every time a new/item is preferred by an existing user? Or we can take the new data of existing user perform user rating and run recommender job using previously computed user-features and item-feature matrix? Reason I’m asking this is me and my team are working with transaction data of 17million customers and close to 170k unique items. We have close to half a billion records. So the factorization job takes a lot of time. Ideally we want to recommend items to the users based on their most recent behavior .The only way to do that is to updated the utility matrix of a given user vector with his most recent purchase. But if we do that do I have to re run the factorization job again or there is a way around this?
