Dear all,
Actually I have sent this mail to the MapReduce mail list yesterday, but
then I realized maybe it is a better place to post it. I apologize if this
email goes to the wrong place, and I'll be appreciated if any one corrects me.
I would like to introduce a new project "BigO" to you all. It is a set of
tools that uses multiplicative methods to upscale several machine learning
algorithms on MapReduce, and it is inspired from my master's thesis "Upscaling
Several Key Machine Learning Algorithms".
around March last year, I was noticed a paper published by Microsoft
Research
http://research.microsoft.com/apps/pubs/default.aspx?id=119077
and found the generalized approach introduced by this paper is really
fascinating and is very adaptive for some other machine learning algorithms, so
in my master's thesis, I designed the two generic multiplicative models on
MapReduce, and three algorithms (NMF, SVM and PageRank) are implemented under
such model. Maybe this cannot be counted as a "ground breaking" discovery since
I've read lot of papers talking about that, and I do not make any claim that
our implementation is faster than any other specialized implementations. But I
believe it may worth trying to summarize several machine learning
implementation under a same model, and provide generic solutions.
Our Methods are specialized to solve the following two multiplicative
problems:
1. Similarity Measure: Similarity measure is common when given a training
set containing feature vectors. This model is characterized by its high-density
of output matrix and large intermediate output.
2. Iterative Multiplication: Several Matrix Multiplications are needed to
be calculated iteratively for some solutions (e.g. Optimization Problems), thus
a light-weighted matrix multiplication implementation may be designed.
By using these two models, we show that several learning algorithms can be
adapted efficiently. We are making a submission for a conference paper.
For anyone who may be interested, I invite you to visit our website:
http://code.google.com/p/bigo2/
A latest release is available on it and several toy examples can be tasted
(Please see the instructions in release package). I'm going to make a new
release contains more algorithms in Feb and a detailed summary of
implementation should be uploaded soon.
I'm currently looking for supports/ideas/opinions for this project, and
anyone interested are welcomed to send me emails via [email protected]
Best Wishes
Song Liu