GitHub user kellrott opened a pull request:

    https://github.com/apache/spark/pull/1292

    Mllib grouped optimization

    The purpose of this patch is the enable MLLib to better handle scenarios 
where the user would want to do learning on multiple feature/label sets at the 
same time. Rather then schedule each learning task separately, this patch lets 
the user create a single RDD with an Int key to represent the 'group' sets of 
entries belong to.
    
    This patch establishing the GroupedOptimizer trait, for which 
GroupedGradientDescent  has been implemented. This systems differs from the 
original Optimizer trait in that the original optimize method accepted 
RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double, 
Vector))].
    The difference is that the GroupedOptimizer uses a 'group' ID key in the 
RDD to multiplex multiple optimization operations in the same RDD.
    
    This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, 
for which the 'run' method has had the RDD[LabeledPoint] input replaced with 
RDD[(Int,LabeledPoint)].
    
    This patch also provides a unit test and utility to take the results of 
MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous 
learning. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kellrott/spark mllib-grouped

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1292
    
----
commit 664196a78cece095ac78293379503afc9f14c2c9
Author: Kyle Ellrott <[email protected]>
Date:   2014-07-01T16:45:16Z

    Adding files to do grouped optimization (Gradient Decent right now)

commit f99c8abf5322ddb73bd5f56a53a4784d5f20e8cf
Author: Kyle Ellrott <[email protected]>
Date:   2014-07-01T20:53:03Z

    Adding GroupedGeneralizedLinearAlgorithm class

commit 02a192adb5daf8b1812bac7ad6ba0b2233040208
Author: Kyle Ellrott <[email protected]>
Date:   2014-07-02T20:21:22Z

    Working GroupedSVM and unit tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to