Apache Spark GSOC 2015

Tamer TAS Wed, 11 Mar 2015 13:25:32 -0700

Hello Everyone,

I'm a senior year computer engineering student in Turkey.
My main area of interests are cloud computing and machine learning.


I've been working on Apache Spark using Scala API for a few months. My projects 
involved the use of MLib for a movie recommendation system and a stock 
prediction model. I would be interested in working on Spark for GSOC 2015. From 
my experience there a few enhancements that can be done; 
 - Learning models can be standardized in a hierarchical manner to increase 
code quality and make future algorithm implementations easier. For example, 
even though it's in graphx library, SVD++ didn't have any model 
implementations. Currently it only returns the pieces of the calculation. The 
documentation wasn't clear either (apart from the link to the SVD++ paper). 
 - New algorithms might be implemented to such as restricted Boltzmann 
machines, tensor models and tensor factorization for recommendation 
sub-library, svm multi-class classification.
 - Testing documentation was close to none(only a blog post link). Each test 
creates a new spark context. Work-arounds were necessary to increase testing 
productivity(e.g. pass,fail,refactor cycle was taking a long time).
But, don't get the idea that I dislike Spark for not having those features. I 
loved working with Spark and I'd be happy to work on improving it. Mainly the 
model hierarchy and new machine learning algorithms for Spark MLib and GraphX 
if there is anyone who would be interested in mentoring. I'll work on a 
proposal to give more details about algorithms, a timeline. I just wanted to 
give a heads-up before doing so.
If you have any questions please feel free to ask.
Thanks in advance.

Tamer Tas

Apache Spark GSOC 2015

Reply via email to