Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/3062#issuecomment-96549372
  
    Here my thinking.
    
    When normalizationMethod = logit, the predicted value is computed as 
    `pj = 1 / ( 1 + exp( -yj ) )`
    When we set `intercept = 0` for `targetCategory=0` => `yj = 0` => `pj = 1 / 
( 1 + exp( 0 ) ) = 0.5`, which is in fact the default threshold for binary 
logistic classification.
    
    Say we want `pj = threshold` (whereas threshold could be any value between 
0 and 1, different to the default 0.5), we need to compute yj in terms of pj 
(which is given)
    `yj = -ln(1/pj-1)` where yj is the intercept and pj is the threshold, hence 
`intercept = -ln(1/threshold - 1)`.
    
    Need to be tested, meantime, what do @vruusmann @mengxr think?
    
    A pratical example, `model.threshold = 0.6` => `intercept = -ln(1/threshold 
- 1) = 0.41` (calculated from pmml exporter) => `pj = 1 / ( 1 + exp( -0.41 ) ) 
= 0.6` (computed by JPMML back to the original threshold).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to