AFAIK, we can guarantee with/without standardization, the models always converged to the same solution if there is no regularization. You can refer the test casts at:
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L551 https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L588 Thanks Yanbo On Mon, Oct 10, 2016 at 7:27 AM, Sean Owen <so...@cloudera.com> wrote: > (BTW I think it means "when no standardization is applied", which is how > you interpreted it, yes.) I think it just means that if feature i is > divided by s_i, then its coefficients in the resulting model will end up > larger by a factor of s_i. They have to be divided by s_i to put them back > on the same scale as the unnormalized inputs. I don't think that in general > it will result in exactly the same model, because part of the point of > standardizing is to improve convergence. You could propose a rewording of > the two occurrences of this paragraph if you like. > > On Mon, Oct 10, 2016 at 3:15 PM Cesar <ces...@gmail.com> wrote: > >> >> I have a question regarding how the default standardization in the ML >> version of the Logistic Regression (Spark 1.6) works. >> >> Specifically about the next comments in the Spark Code: >> >> /** >> * Whether to standardize the training features before fitting the model. >> * The coefficients of models will be always returned on the original >> scale, >> * so it will be transparent for users. *Note that with/without >> standardization,* >> ** the models should be always converged to the same solution when no >> regularization* >> ** is applied.* In R's GLMNET package, the default behavior is true as >> well. >> * Default is true. >> * >> * @group setParam >> */ >> >> >> Specifically I am having issues with understanding why the solution >> should converge to the same weight values with/without standardization ? >> >> >> >> Thanks ! >> -- >> Cesar Flores >> >