This sounds pretty exciting. Beyond that, it is hard to say much. Can you say a bit more about how you would see introducing the code into Mahout?
On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <[email protected]> wrote: > By the way, I want to mention that my thesis is advised by Ozgur Yilmazel, > who is a founding member of the Mahout project. I conducted this study and > kept the implementation integrable to Mahout with his guidance. > > On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <[email protected]> wrote: > > > Dear Mahout community, > > > > I would like to introduce a set of tools for recommender systems those > are > > implemented as a part of my MSc. thesis. This is inspired by our > > conversations in the user-list, and I tried to stick it to existing Taste > > framework for possible contribution to Mahout. > > > > The library is available at github.com/gcapan/recommender< > http://github.com/gcapan>. > > > > > > The library contains Stochastic Gradient Descent based learning > algorithms > > for Matrix Factorization based recommendation. > > > > Core features of the library are listed below: > > > > 1- It handles different recommendation targets (feedback), namely; > > - Standard numerical recommendation with OLS Regression > > - Binary recommendation with Logistic Regression > > - Multinomial recommendation with Softmax Regression > > - Ordinal recommendation with Proportional Odds Model > > - Predicting counts with Poisson Regression (still experimental) > > > > 2- It may use side information from users and items if available > > > > 3- It may leverage the dynamic side information (this is what I called > > it), which means the features whose values are determined at feedback > time > > (e.g. day of week for possible effect on people's choices, proximity for > > location aware recommendation, etc.) > > > > 4- It is an online learning algorithm thus scalable. However, currently > > the model is stored in memory. I plan to extend it to store the model in > > HBase, too. > > > > > > The recommenders implement the Mahout's Recommender interface. For > > experiments, I have implemented a GenericIncrementalDataModel (in > memory), > > and List based PreferenceArrays. > > > > I tried to use Mahout's data structures where available. For example, > > factor vectors and side info vectors are in Mahout's vector format. > > > > These algorithms are highly inspired by various influential Recommender > > System papers, especially from Yehuda Koren. For example, the Ordinal > model > > is from Koren's OrdRec paper, except the cuts are not user-specific but > > global. > > > > I tried the numerical recommender on MovieLens-1M dataset, and it > achieved > > around 0.851 RMSE with 150 factors and 30 iterations. > > > > The code is tested, but not fully documented. > > > > With some effort, the code can be integrated into Mahout. If it has a > > potential to be beneficial for Mahout users, I will be happy to > contribute > > it to ASF with your guidance. > > > > Any feedback is appreciated. > > > > Regards > > > > -- > > Gokhan > > > > > -- > Gokhan >
