[GitHub] spark pull request: L-BFGS Documentation

mengxr Thu, 15 May 2014 07:54:13 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/702#discussion_r12460502
  
    --- Diff: docs/mllib-optimization.md ---
    @@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
     Available algorithms for gradient descent:
     
     * 
[GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
    +
    +### Limited-memory BFGS
    +L-BFGS is currently only a low-level optimization primitive in `MLlib`. If 
you want to use L-BFGS in various 
    +ML algorithms such as Linear Regression, and Logistic Regression, you have 
to pass the gradient of objective
    +function, and updater into optimizer yourself instead of using the 
training APIs like 
    
+[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression).
    +See the example below. It will be addressed in the next release. 
    +
    +The L1 regularization by using 
    
+[Updater.L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.Updater)
 will not work since the 
    +soft-thresholding logic in L1Updater is designed for gradient descent.
    +
    +The L-BFGS method
    
+[LBFGS.runLBFGS](api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS)
    +has the following parameters:
    +
    +* `gradient` is a class that computes the gradient of the objective 
function
    +being optimized, i.e., with respect to a single training example, at the
    +current parameter value. MLlib includes gradient classes for common loss
    +functions, e.g., hinge, logistic, least-squares.  The gradient class takes 
as
    +input a training example, its label, and the current parameter value. 
    +* `updater` is a class originally designed for gradient decent which 
computes 
    +the actual gradient descent step. However, we're able to take the gradient 
and 
    +loss of objective function of regularization for L-BFGS by ignoring the 
part of logic
    +only for gradient decent such as adaptive step size stuff. We will 
refactorize
    +this into regularizer to replace updater to separate the logic between 
    +regularization and step update later. 
    +* `numCorrections` is the number of corrections used in the L-BFGS update. 
10 is 
    +recommended.
    +* `maxNumIterations` is the maximal number of iterations that L-BFGS can 
be run.
    +* `regParam` is the regularization parameter when using regularization.
    +* `return` A tuple containing two elements. The first element is a column 
matrix
    +containing weights for every feature, and the second element is an array 
containing 
    +the loss computed for every iteration.
    +
    +Here is an example to train binary logistic regression with L2 
regularization using
    +L-BFGS optimizer. 
    +{% highlight scala %}
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.util.MLUtils
    +import org.apache.spark.mllib.classification.LogisticRegressionModel
    +import breeze.linalg.{DenseVector => BDV}
    +
    +val data = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
    +val numFeatures = data.take(1)(0).features.size
    +
    +// Split data into training (60%) and test (40%).
    +val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
    +
    +// Prepend 1 into the training data as intercept.
    +val training = splits(0).map(x =>
    +  (x.label, Vectors.fromBreeze(
    +    BDV.vertcat(BDV.ones[Double](1), x.features.toBreeze.toDenseVector)))
    --- End diff --
    
    I added `MLUtils.appendBias` recently. Could you switch to it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: L-BFGS Documentation

Reply via email to