[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-27 Thread Rakesh Chalasani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514905#comment-14514905
 ] 

Rakesh Chalasani commented on SPARK-5256:
-

Just a thought, as much as I know, please correct if I am wrong, the optimizer 
usually takes number of iterations as the stopping criteria for the optimizer. 
For example, in classification or regression tasks, there is no way to do early 
stopping using a validation set or understand if the rate of change in the loss 
function flattened. 

To felicitate this, how about having a set functions computing tolerance, at 
the end of each iteration or a after a set number of iterations to ensure early 
stopping? Another alternative it to not have this at all, and leave to the 
ml-pipeline CrossValidator. 

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-27 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514963#comment-14514963
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

I'd disagree.  There are almost always ways to do early stopping, either by 
looking at the change in loss or parameters.  You can also do early stopping 
based on a validation set, but that is outside the realm of optimization and 
can be discussed in a different JIRA.

I think the current efforts to support (a) max iterations and (b) convergence 
tolerance parameters should suffice for stopping criteria.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494528#comment-14494528
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

[~avulanov] I wonder, though, if SGD is that important for Spark.  For convex 
problems (for which the stochasticity of SGD is not that helpful), the overhead 
of network communication makes non-stochastic methods more appealing; as long 
as you are taking the hit of network communication, you might as well make a 
pass over all data, especially if it lets us use faster methods (such as 
accelerated gradient methods).

This doesn't apply as much to non-convex problems or problems for which 
gradient computations are very expensive of course.

What use case do you want to optimize for?

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494518#comment-14494518
 ] 

Alexander Ulanov commented on SPARK-5256:
-

Probably the main issue for MLlib is that iterative algorithms are implemented 
with aggregate function. It has a fixed overhead around half of a second that 
limits its application when one needs to make big number of iterations. This is 
the case for bigger data for which Spark is intended for. This problem gets 
worse with stochastic algorithms because there is no good way to randomly pick 
data from RDD and one needs to sequentially look through it.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494562#comment-14494562
 ] 

Shivaram Venkataraman commented on SPARK-5256:
--

FWIW I also think that API design is a little orthogonal from implementations / 
performance concerns. [~avulanov] While I see your point about overheads, 
sampling etc. I am not sure it relates to the API design ?

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494568#comment-14494568
 ] 

Alexander Ulanov commented on SPARK-5256:
-

The size of data that requires to use Spark suggests that learning algorithm 
will be limited by time versus data. According to the paper The tradeoffs of 
large scale learning, SGD has significantly faster convergence than batch GD 
in this case. My use case is machine learning on large data, in particular, 
time series.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494579#comment-14494579
 ] 

Alexander Ulanov commented on SPARK-5256:
-

[~shivaram] Indeed, performance is orthogonal to the API design. Though 
well-designed things should work efficient, don't you think? :)

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494732#comment-14494732
 ] 

Shivaram Venkataraman commented on SPARK-5256:
--

Yeah but performance improvement is a continuous process while API design is 
hopefully less frequent and longer term. Also I think we should just track them 
on separate JIRA issues. 

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-12 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491891#comment-14491891
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

Added link to [SPARK-1227], which discusses ML diagnostics and brings up the 
question of what loss functions should be provided as Loss classes rather than 
via the ClassificationMetrics and RegressionMetrics classes.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486206#comment-14486206
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

*Q*: Should Optimizer store Gradient and Updater?

*Proposal*: No.  Gradient and Updater (regularization type) are model 
parameters, not Optimizer parameters.  The generalized linear algorithm should 
take Optimizer, Gradient, and regularization type as separate parameters.  
Internally, the GLM can pass the Gradient and reg type to the Optimizer, either 
as method parameters:
{code}
Optimizer.step(currentWeights, gradient, regType)
{code}
or by constructing a specific optimizer
{code}
val optimizer = new Optimizer(gradient, regType)
newWeights = optimizer.step(currentWeights)
{code}

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486184#comment-14486184
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

(Comment related to link to [SPARK-6682]) Builder methods for GLMs have issues 
because of the Optimizer API.  (See discussion above: The constructors for 
Optimizer require Gradient and Updater.)  Cleaning up the Optimizer API will 
facilitate moving to the builder API.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-04-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486196#comment-14486196
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

*Q*: Do we like the Updater concept?

*Proposal*: No.  It conflates the regularization type with the 
regularization-related update.  The regularization type should be a model 
parameter.  The update function should depend on the model's regularization 
type and the optimizer.  There are only two such update functions we need 
currently: (sub)gradient step (for L1 or L2) and projection (for L1).  We could 
add more later.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-01-21 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286706#comment-14286706
 ] 

Alexander Ulanov commented on SPARK-5256:
-

I've implemented my proposition with Vector as output in 
https://issues.apache.org/jira/browse/SPARK-5362

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-01-14 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277858#comment-14277858
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

Generalization: grouped optimization

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-01-14 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277857#comment-14277857
 ] 

Joseph K. Bradley commented on SPARK-5256:
--

Improving Updater concept

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-01-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277986#comment-14277986
 ] 

Alexander Ulanov commented on SPARK-5256:
-

I would like to improve Gradient interface, so it will be able to process 
something more general than `Label` (which is relevant only to classifiers but 
not to other machine learning methods) and also allowing batch processing. The 
simplest way for me of doing this is to add another function to `Gradient` 
interface:

def compute(data: Vector, output: Vector, weights: Vector, cumGradient: 
Vector): Double

In `Gradient` trait it should call `compute` with `label`. Of course, one needs 
to make some adjustments to LBFGS and GradientDescent optimizers, replacing 
label: double with output:vector. 

 For batch processing one can put data and output points stacked into a long 
vector (matrices are stored in this way in breeze) and pass them with the 
proposed interface.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5256) Improving MLlib optimization APIs

2015-01-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277988#comment-14277988
 ] 

Alexander Ulanov commented on SPARK-5256:
-

Also, asynchronous gradient update might be a good thing to have.

 Improving MLlib optimization APIs
 -

 Key: SPARK-5256
 URL: https://issues.apache.org/jira/browse/SPARK-5256
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 *Goal*: Improve APIs for optimization
 *Motivation*: There have been several disjoint mentions of improving the 
 optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
 a place to discuss what API changes are necessary for the long term, and to 
 provide links to other relevant JIRAs.
 Eventually, I hope this leads to a design doc outlining:
 * current issues
 * requirements such as supporting many types of objective functions, 
 optimization algorithms, and parameters to those algorithms
 * ideal API
 * breakdown of smaller JIRAs needed to achieve that API
 I will soon create an initial design doc, and I will try to watch this JIRA 
 and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org