Dear Mahout community,

I would like to introduce a set of tools for recommender systems those are
implemented as a part of my MSc. thesis. This is inspired by our
conversations in the user-list, and I tried to stick it to existing Taste
framework for possible contribution to Mahout.

The library is available at
github.com/gcapan/recommender<http://github.com/gcapan>.


The library contains Stochastic Gradient Descent based learning algorithms
for Matrix Factorization based recommendation.

Core features of the library are listed below:

1-  It handles different recommendation targets (feedback), namely;
    - Standard numerical recommendation with OLS Regression
    - Binary recommendation with Logistic Regression
    - Multinomial recommendation with Softmax Regression
    - Ordinal recommendation with Proportional Odds Model
    - Predicting counts with Poisson Regression (still experimental)

2- It may use side information from users and items if available

3- It may leverage the dynamic side information (this is what I called it),
which means the features whose values are determined at feedback time (e.g.
day of week for possible effect on people's choices, proximity for location
aware recommendation, etc.)

4- It is an online learning algorithm thus scalable. However, currently the
model is stored in memory. I plan to extend it to store the model in HBase,
too.


The recommenders implement the Mahout's Recommender interface. For
experiments, I have implemented a GenericIncrementalDataModel (in memory),
and List based PreferenceArrays.

I tried to use Mahout's data structures where available. For example,
factor vectors and side info vectors are in Mahout's vector format.

These algorithms are highly inspired by various influential Recommender
System papers, especially from Yehuda Koren. For example, the Ordinal model
is from Koren's OrdRec paper, except the cuts are not user-specific but
global.

I tried the numerical recommender on MovieLens-1M dataset, and it achieved
around 0.851 RMSE with 150 factors and 30 iterations.

The code is tested, but not fully documented.

With some effort, the code can be integrated into Mahout. If it has a
potential to be beneficial for Mahout users, I will be happy to contribute
it to ASF with your guidance.

Any feedback is appreciated.

Regards

-- 
Gokhan

Reply via email to