[GitHub] [spark] srowen commented on a change in pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

GitBox Wed, 29 Apr 2020 07:38:23 -0700


srowen commented on a change in pull request #28349:
URL: https://github.com/apache/spark/pull/28349#discussion_r417365454




##########
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
##########
@@ -192,77 +219,119 @@ class LinearSVC @Since("2.2.0") (
     instr.logNumClasses(numClasses)
     instr.logNumFeatures(numFeatures)
 
-    val (coefficientVector, interceptVector, objectiveHistory) = {
-      if (numInvalid != 0) {
-        val msg = s"Classification labels should be in [0 to ${numClasses - 
1}]. " +
-          s"Found $numInvalid invalid labels."
-        instr.logError(msg)
-        throw new SparkException(msg)
-      }
-
-      val featuresStd = summarizer.std.toArray
-      val getFeaturesStd = (j: Int) => featuresStd(j)
-      val regParamL2 = $(regParam)
-      val bcFeaturesStd = instances.context.broadcast(featuresStd)
-      val regularization = if (regParamL2 != 0.0) {
-        val shouldApply = (idx: Int) => idx >= 0 && idx < numFeatures
-        Some(new L2Regularization(regParamL2, shouldApply,
-          if ($(standardization)) None else Some(getFeaturesStd)))
-      } else {
-        None
-      }
+    if (numInvalid != 0) {
+      val msg = s"Classification labels should be in [0 to ${numClasses - 1}]. 
" +
+        s"Found $numInvalid invalid labels."
+      instr.logError(msg)
+      throw new SparkException(msg)
+    }
 
-      val getAggregatorFunc = new HingeAggregator(bcFeaturesStd, 
$(fitIntercept))(_)
-      val costFun = new RDDLossFunction(instances, getAggregatorFunc, 
regularization,
-        $(aggregationDepth))
+    val featuresStd = summarizer.std.toArray
+    val getFeaturesStd = (j: Int) => featuresStd(j)
+    val regularization = if ($(regParam) != 0.0) {
+      val shouldApply = (idx: Int) => idx >= 0 && idx < numFeatures
+      Some(new L2Regularization($(regParam), shouldApply,
+        if ($(standardization)) None else Some(getFeaturesStd)))
+    } else None
+
+    def regParamL1Fun = (index: Int) => 0D

Review comment:
       Total nit, but write 0.0 ?

##########
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
##########
@@ -154,31 +156,56 @@ class LinearSVC @Since("2.2.0") (
   def setAggregationDepth(value: Int): this.type = set(aggregationDepth, value)
   setDefault(aggregationDepth -> 2)
 
+  /**
+   * Set block size for stacking input data in matrices.

Review comment:
       We might provide a little more comment about what this does. Increasing 
it increases performance, but at the risk of what, slowing down on sparse input?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

Reply via email to