By the way, I want to mention that my thesis is advised by Ozgur Yilmazel, who is a founding member of the Mahout project. I conducted this study and kept the implementation integrable to Mahout with his guidance.
On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > Dear Mahout community, > > I would like to introduce a set of tools for recommender systems those are > implemented as a part of my MSc. thesis. This is inspired by our > conversations in the user-list, and I tried to stick it to existing Taste > framework for possible contribution to Mahout. > > The library is available at > github.com/gcapan/recommender<http://github.com/gcapan>. > > > The library contains Stochastic Gradient Descent based learning algorithms > for Matrix Factorization based recommendation. > > Core features of the library are listed below: > > 1- It handles different recommendation targets (feedback), namely; > - Standard numerical recommendation with OLS Regression > - Binary recommendation with Logistic Regression > - Multinomial recommendation with Softmax Regression > - Ordinal recommendation with Proportional Odds Model > - Predicting counts with Poisson Regression (still experimental) > > 2- It may use side information from users and items if available > > 3- It may leverage the dynamic side information (this is what I called > it), which means the features whose values are determined at feedback time > (e.g. day of week for possible effect on people's choices, proximity for > location aware recommendation, etc.) > > 4- It is an online learning algorithm thus scalable. However, currently > the model is stored in memory. I plan to extend it to store the model in > HBase, too. > > > The recommenders implement the Mahout's Recommender interface. For > experiments, I have implemented a GenericIncrementalDataModel (in memory), > and List based PreferenceArrays. > > I tried to use Mahout's data structures where available. For example, > factor vectors and side info vectors are in Mahout's vector format. > > These algorithms are highly inspired by various influential Recommender > System papers, especially from Yehuda Koren. For example, the Ordinal model > is from Koren's OrdRec paper, except the cuts are not user-specific but > global. > > I tried the numerical recommender on MovieLens-1M dataset, and it achieved > around 0.851 RMSE with 150 factors and 30 iterations. > > The code is tested, but not fully documented. > > With some effort, the code can be integrated into Mahout. If it has a > potential to be beneficial for Mahout users, I will be happy to contribute > it to ASF with your guidance. > > Any feedback is appreciated. > > Regards > > -- > Gokhan -- Gokhan