On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> This sounds pretty exciting. Beyond that, it is hard to say much. > > Can you say a bit more about how you would see introducing the code into > Mahout? > Ted, I've forked apache/mahout at github, and I will merge the library into mahout. I believe in a week I will be able to add documentation and mahout jobs for experiments and start submitting patches to JIRA. > On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > By the way, I want to mention that my thesis is advised by Ozgur > Yilmazel, > > who is a founding member of the Mahout project. I conducted this study > and > > kept the implementation integrable to Mahout with his guidance. > > > > On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > > > > Dear Mahout community, > > > > > > I would like to introduce a set of tools for recommender systems those > > are > > > implemented as a part of my MSc. thesis. This is inspired by our > > > conversations in the user-list, and I tried to stick it to existing > Taste > > > framework for possible contribution to Mahout. > > > > > > The library is available at github.com/gcapan/recommender< > > http://github.com/gcapan>. > > > > > > > > > The library contains Stochastic Gradient Descent based learning > > algorithms > > > for Matrix Factorization based recommendation. > > > > > > Core features of the library are listed below: > > > > > > 1- It handles different recommendation targets (feedback), namely; > > > - Standard numerical recommendation with OLS Regression > > > - Binary recommendation with Logistic Regression > > > - Multinomial recommendation with Softmax Regression > > > - Ordinal recommendation with Proportional Odds Model > > > - Predicting counts with Poisson Regression (still experimental) > > > > > > 2- It may use side information from users and items if available > > > > > > 3- It may leverage the dynamic side information (this is what I called > > > it), which means the features whose values are determined at feedback > > time > > > (e.g. day of week for possible effect on people's choices, proximity > for > > > location aware recommendation, etc.) > > > > > > 4- It is an online learning algorithm thus scalable. However, currently > > > the model is stored in memory. I plan to extend it to store the model > in > > > HBase, too. > > > > > > > > > The recommenders implement the Mahout's Recommender interface. For > > > experiments, I have implemented a GenericIncrementalDataModel (in > > memory), > > > and List based PreferenceArrays. > > > > > > I tried to use Mahout's data structures where available. For example, > > > factor vectors and side info vectors are in Mahout's vector format. > > > > > > These algorithms are highly inspired by various influential Recommender > > > System papers, especially from Yehuda Koren. For example, the Ordinal > > model > > > is from Koren's OrdRec paper, except the cuts are not user-specific but > > > global. > > > > > > I tried the numerical recommender on MovieLens-1M dataset, and it > > achieved > > > around 0.851 RMSE with 150 factors and 30 iterations. > > > > > > The code is tested, but not fully documented. > > > > > > With some effort, the code can be integrated into Mahout. If it has a > > > potential to be beneficial for Mahout users, I will be happy to > > contribute > > > it to ASF with your guidance. > > > > > > Any feedback is appreciated. > > > > > > Regards > > > > > > -- > > > Gokhan > > > > > > > > > > -- > > Gokhan > > > -- Gokhan