MGerlach added a comment.
In T282563#7252149 <https://phabricator.wikimedia.org/T282563#7252149>, @awight wrote: > In T282563#7251679 <https://phabricator.wikimedia.org/T282563#7251679>, @GoranSMilovanovic wrote: > >> Anyways, following a series of cross-validations and tricks to account for a highly imbalanced dataset, one Random Forrest classifier was able to predict leave vs stay in Wikidata with: >> >> - **Accuracy of 97%**, >> - **Hit rate (True Positive Rate, TPP) of 90%**, >> - and a **False Alarm (False Positive Rate, FPP) of only 2.8%**. > > What about the true/false negative rate? To my untrained eye, these numbers look typical for an imbalanced training/test set, where we have a lot of people abandoning so it's really easy for a classifier to accurately predict that a user will leave, but probably much less accurate at predicting that a person will stay. I agree with @awight. The high accuracy is not to be taken at face value as the positive/negative groups are probably highly imbalanced (not sure if this is true but it looks like most account stop editing very quickly). Two options to make the numbers more interpretable: - compare with a baseline predictor that does not use any of the features. This could be either a random guess (for example based on the Lindy-curve) or simply always guessing the majority-class - using a balanced test-set such that you have the same number of positive and negative examples (for example via downsampling the majority class or vice versa) TASK DETAIL https://phabricator.wikimedia.org/T282563 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, MGerlach Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
