[jira] [Comment Edited] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494579#comment-14494579
 ] 

Alexander Ulanov edited comment on SPARK-5256 at 4/14/15 6:48 PM:
--

[~shivaram] Indeed, performance is orthogonal to the API design. Though 
well-designed things should work efficient, shouldn't they? :)


was (Author: avulanov):
[~shivaram] Indeed, performance is orthogonal to the API design. Though 
well-designed things should work efficient, don't you think? :)

> Improving MLlib optimization APIs
> -
>
> Key: SPARK-5256
> URL: https://issues.apache.org/jira/browse/SPARK-5256
> Project: Spark
>  Issue Type: Umbrella
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>
> *Goal*: Improve APIs for optimization
> *Motivation*: There have been several disjoint mentions of improving the 
> optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
> a place to discuss what API changes are necessary for the long term, and to 
> provide links to other relevant JIRAs.
> Eventually, I hope this leads to a design doc outlining:
> * current issues
> * requirements such as supporting many types of objective functions, 
> optimization algorithms, and parameters to those algorithms
> * ideal API
> * breakdown of smaller JIRAs needed to achieve that API
> I will soon create an initial design doc, and I will try to watch this JIRA 
> and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5256) Improving MLlib optimization APIs

2015-04-14 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494568#comment-14494568
 ] 

Alexander Ulanov edited comment on SPARK-5256 at 4/14/15 6:43 PM:
--

The size of data that requires to use Spark suggests that learning algorithm 
will be limited by time versus data. According to the paper "The tradeoffs of 
large scale learning", SGD has significantly faster convergence than batch GD 
in this case. My use case is machine learning on large data, in particular, 
time series. 

Just in case, link to the paper 
http://papers.nips.cc/paper/3323-the-tradeoffs-of-large-scale-learning.pdf


was (Author: avulanov):
The size of data that requires to use Spark suggests that learning algorithm 
will be limited by time versus data. According to the paper "The tradeoffs of 
large scale learning", SGD has significantly faster convergence than batch GD 
in this case. My use case is machine learning on large data, in particular, 
time series.

> Improving MLlib optimization APIs
> -
>
> Key: SPARK-5256
> URL: https://issues.apache.org/jira/browse/SPARK-5256
> Project: Spark
>  Issue Type: Umbrella
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>
> *Goal*: Improve APIs for optimization
> *Motivation*: There have been several disjoint mentions of improving the 
> optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
> a place to discuss what API changes are necessary for the long term, and to 
> provide links to other relevant JIRAs.
> Eventually, I hope this leads to a design doc outlining:
> * current issues
> * requirements such as supporting many types of objective functions, 
> optimization algorithms, and parameters to those algorithms
> * ideal API
> * breakdown of smaller JIRAs needed to achieve that API
> I will soon create an initial design doc, and I will try to watch this JIRA 
> and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5256) Improving MLlib optimization APIs

2015-04-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486206#comment-14486206
 ] 

Joseph K. Bradley edited comment on SPARK-5256 at 4/8/15 10:41 PM:
---

*Q*: Should Optimizer store Gradient and Updater?

*Proposal*: No.  Gradient and Updater (regularization type) are model 
parameters, not Optimizer parameters.  The generalized linear algorithm should 
take Optimizer, Gradient, and regularization type as separate parameters.  
Internally, the GLM can pass the Gradient and reg type to the Optimizer, either 
as method parameters:
{code}
Optimizer.step(currentWeights, gradient, regType)
{code}
or by constructing a specific optimizer
{code}
val optimizer = new Optimizer(gradient, regType)
newWeights = optimizer.step(currentWeights)
{code}

*Another note*: [~avulanov] pointed out in the dev list that, in general, the 
Gradient and Updater do need to be tightly coupled so that both know which 
weight is the intercept/bias term (and not regularized).  If the GLM takes both 
as parameters as in this proposal, it could be responsible for informing the 
Gradient and Updater of which weight is the intercept.


was (Author: josephkb):
*Q*: Should Optimizer store Gradient and Updater?

*Proposal*: No.  Gradient and Updater (regularization type) are model 
parameters, not Optimizer parameters.  The generalized linear algorithm should 
take Optimizer, Gradient, and regularization type as separate parameters.  
Internally, the GLM can pass the Gradient and reg type to the Optimizer, either 
as method parameters:
{code}
Optimizer.step(currentWeights, gradient, regType)
{code}
or by constructing a specific optimizer
{code}
val optimizer = new Optimizer(gradient, regType)
newWeights = optimizer.step(currentWeights)
{code}

> Improving MLlib optimization APIs
> -
>
> Key: SPARK-5256
> URL: https://issues.apache.org/jira/browse/SPARK-5256
> Project: Spark
>  Issue Type: Umbrella
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>
> *Goal*: Improve APIs for optimization
> *Motivation*: There have been several disjoint mentions of improving the 
> optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
> a place to discuss what API changes are necessary for the long term, and to 
> provide links to other relevant JIRAs.
> Eventually, I hope this leads to a design doc outlining:
> * current issues
> * requirements such as supporting many types of objective functions, 
> optimization algorithms, and parameters to those algorithms
> * ideal API
> * breakdown of smaller JIRAs needed to achieve that API
> I will soon create an initial design doc, and I will try to watch this JIRA 
> and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org