[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

mengxr Wed, 30 Jul 2014 23:14:09 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1624#discussion_r15627471
  
    --- Diff: python/pyspark/mllib/regression.py ---
    @@ -109,18 +109,45 @@ class 
LinearRegressionModel(LinearRegressionModelBase):
         True
         """
     
    -
     class LinearRegressionWithSGD(object):
         @classmethod
    -    def train(cls, data, iterations=100, step=1.0,
    -              miniBatchFraction=1.0, initialWeights=None):
    -        """Train a linear regression model on the given data."""
    +    def train(cls, data, iterations=100, step=1.0, regParam=1.0, 
regType=None,
    +              intercept=False, miniBatchFraction=1.0, initialWeights=None):
    +        """
    +        Train a linear regression model on the given data.
    +
    +        @param data:              The training data.
    +        @param iterations:        The number of iterations (default: 100).
    +        @param step:              The step parameter used in SGD
    +                                  (default: 1.0).
    +        @param regParam:          The regularizer parameter (default: 1.0).
    +        @param regType:           The type of regularizer used for training
    +                                  our model.
    +                                  Allowed values: "l1" for using L1Updater,
    +                                                  "l2" for using
    +                                                       SquaredL2Updater,
    +                                                  "none" for no 
regularizer.
    +                                  (default: None)
    +        @param intercept:         Boolean parameter which indicates the use
    +                                  or not of the augmented representation 
for
    +                                  training data (i.e. whether bias features
    +                                  are activated or not).
    +        @param miniBatchFraction: Fraction of data to be used for each SGD
    +                                  iteration.
    +        @param initialWeights:    The initial weights (default: None).
    +        """
             sc = data.context
    -        train_f = lambda d, i: 
sc._jvm.PythonMLLibAPI().trainLinearRegressionModelWithSGD(
    -            d._jrdd, iterations, step, miniBatchFraction, i)
    +        if regType is None:
    +            train_f = lambda d, i: 
sc._jvm.PythonMLLibAPI().trainLinearRegressionModelWithSGD(
    --- End diff --
    
    To avoid having the long command twice, you can use
    
    ~~~
    if regType is None:
      regType = "none"
    if regType in {"l2", "l1", "none"}:
        train_f = ...
    else:
        raise ...
    ~~~



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

Reply via email to