Dear Mahout community, I would like to introduce a set of tools for recommender systems those are implemented as a part of my MSc. thesis. This is inspired by our conversations in the user-list, and I tried to stick it to existing Taste framework for possible contribution to Mahout.
The library is available at github.com/gcapan/recommender<http://github.com/gcapan>. The library contains Stochastic Gradient Descent based learning algorithms for Matrix Factorization based recommendation. Core features of the library are listed below: 1- It handles different recommendation targets (feedback), namely; - Standard numerical recommendation with OLS Regression - Binary recommendation with Logistic Regression - Multinomial recommendation with Softmax Regression - Ordinal recommendation with Proportional Odds Model - Predicting counts with Poisson Regression (still experimental) 2- It may use side information from users and items if available 3- It may leverage the dynamic side information (this is what I called it), which means the features whose values are determined at feedback time (e.g. day of week for possible effect on people's choices, proximity for location aware recommendation, etc.) 4- It is an online learning algorithm thus scalable. However, currently the model is stored in memory. I plan to extend it to store the model in HBase, too. The recommenders implement the Mahout's Recommender interface. For experiments, I have implemented a GenericIncrementalDataModel (in memory), and List based PreferenceArrays. I tried to use Mahout's data structures where available. For example, factor vectors and side info vectors are in Mahout's vector format. These algorithms are highly inspired by various influential Recommender System papers, especially from Yehuda Koren. For example, the Ordinal model is from Koren's OrdRec paper, except the cuts are not user-specific but global. I tried the numerical recommender on MovieLens-1M dataset, and it achieved around 0.851 RMSE with 150 factors and 30 iterations. The code is tested, but not fully documented. With some effort, the code can be integrated into Mahout. If it has a potential to be beneficial for Mahout users, I will be happy to contribute it to ASF with your guidance. Any feedback is appreciated. Regards -- Gokhan