GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/5591
[SPARK-7008] An implement of Factorization Machines based on Scala and Spark MLlib. An implementation of Factorization Machines based on Scala and Spark MLlib. Factorization Machine is a kind of state-of-the-art machine learning algorithm for multi-linear regression, and is widely used in recommendation systems. Factorization Machines algorithm and its C++ implement LibFM works well in recent years' recommendation competitions. A FMModel consist of three parts: an Intercept (optional), an one-way interactions weights like other linear models (optional), a numFactors * numFeatures matrix representing the factors of each feature (mandatory). I implement the training algorithm with SGD provided by MLlib's GradientDescent (to use MLlib's SGD, the model is encoded to a dense vector during training). Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark-factorization-machine master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5591.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5591 ---- commit 18c3a494c4ccca1843748023f21cda34c5ecd9a8 Author: zhengruifeng <ruife...@foxmail.com> Date: 2015-04-20T01:43:42Z add FactorizationMachine.scala commit 2b49d74805e02a21cf0e5a0aa022493f19868a4b Author: zhengruifeng <ruife...@foxmail.com> Date: 2015-04-20T09:13:04Z Add Test commit dfd0c3b54ec016b8cbc233d44432c9e38ed9570e Author: zhengruifeng <ruife...@foxmail.com> Date: 2015-04-20T10:46:35Z Add comments commit ba33af8fa67c568b5130392e7c3a38456405cfc0 Author: zhengruifeng <ruife...@foxmail.com> Date: 2015-04-20T10:53:42Z some comments ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org