[GitHub] spark pull request: [SPARK-13372] [ML] Fix LogisticRegression when...

dbtsai Thu, 18 Feb 2016 09:32:31 -0800

Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/11247#issuecomment-185826603
  
    @yanboliang  In #7080, It was intentionally made that `standardization = 
false` will run the same route as `standardization = true` without 
regularization. Mathematically, this can be proven that they will converge into 
the same solution. 
    
    With your change, it seems that when `standardization = false`, the 
features are not standardized at all, and this will result convergency issue 
when the features have very different scales. As a result, in order to avoid 
this issue, the features will be always standardized no matter what, and then 
each components will be penalized different to correct this effect.
    
    I wonder if your test with less iterations are special case, since without 
standardization, some of the data will not be converged when I tested it. This 
may help some dataset, but I don't know if this will break someone's working 
training.
    
    BTW, can you submit the bug fix for `LogisticRegression.scala` in a 
separate PR?
    
    ```scala
           optimizer.getUpdater() match {
             case x: SquaredL2Updater => runWithMlLogisitcRegression(0.0)
             case x: L1Updater => runWithMlLogisitcRegression(1.0)
    ```
    
    Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13372] [ML] Fix LogisticRegression when...

Reply via email to