[ https://issues.apache.org/jira/browse/SPARK-22449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Teng Peng updated SPARK-22449: ------------------------------ Description: Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and implemented in all major statical tools. Postive reasons: 1. Completeness. 2. Useful for some users. Negative reasons: 1. Not sure how many users would actually use BIC. Possible Implementation: 1. Duplicate AIC's methods. Calculate penalty term independently. Pros: safe & consistent. Cons: duplication. 2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty term independently. Pros: similar to scikit learn. No duplication. Cons: less safe & consistent. Reference: 1. https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other 2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf Thoughts? was: Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and implemented in all major statical tools. Postive reasons: 1. Completeness. 2. Useful for some users. Negative reasons: 1. Not sure how many users would actually use BIC. Possible Implementation: 1. Duplicate almost the same methods for log likelihood part. Calculate penalty term independently. Pros: safe & consistent. Cons: duplication. 2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty term independently. Pros: similar to scikit learn. No duplication. Cons: less safe & consistent. Reference: 1. https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other 2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf Thoughts? > Add BIC for GLM > --------------- > > Key: SPARK-22449 > URL: https://issues.apache.org/jira/browse/SPARK-22449 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.0 > Reporter: Teng Peng > Priority: Minor > > Currently, we only have AIC for GLM. BIC is another "similar" criterion > widely used and implemented in all major statical tools. > Postive reasons: > 1. Completeness. > 2. Useful for some users. > Negative reasons: > 1. Not sure how many users would actually use BIC. > Possible Implementation: > 1. Duplicate AIC's methods. Calculate penalty term independently. Pros: safe > & consistent. Cons: duplication. > 2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty > term independently. > Pros: similar to scikit learn. No duplication. Cons: less safe & consistent. > Reference: > 1. > https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other > 2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf > Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org