[ https://issues.apache.org/jira/browse/SPARK-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-2372. ------------------------------ Resolution: Won't Fix Sounds like a WontFix given the PR discussion > Grouped Optimization/Learning > ----------------------------- > > Key: SPARK-2372 > URL: https://issues.apache.org/jira/browse/SPARK-2372 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.0.1, 1.0.2, 1.1.0 > Reporter: Kyle Ellrott > > The purpose of this patch is the enable MLLib to better handle scenarios > where the user would want to do learning on multiple feature/label sets at > the same time. Rather then schedule each learning task separately, this patch > lets the user create a single RDD with an Int key to represent the 'group' > sets of entries belong to. > This patch establishing the GroupedOptimizer trait, for which > GroupedGradientDescent has been implemented. This systems differs from the > original Optimizer trait in that the original optimize method accepted > RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double, > Vector))]. > The difference is that the GroupedOptimizer uses a 'group' ID key in the RDD > to multiplex multiple optimization operations in the same RDD. > This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, for > which the 'run' method has had the RDD[LabeledPoint] input replaced with > RDD[(Int,LabeledPoint)]. > This patch also provides a unit test and utility to take the results of > MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous > learning. > https://github.com/apache/spark/pull/1292 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org