GoranSMilovanovic added a comment.
@awight First of all, I might have missed to mention that the outcome variable (i.e. what we are predicting) is **"stay"**, not "leave". My bad. > I'm unsure whether "positive" here means the classifier identifies a person who will leave or stay, btw., can you share more about the test results? These terms have one and the same meaning in Statistical Decision Theory and ML, always, see ROC Analysis from Wikipedia <https://en.wikipedia.org/wiki/Receiver_operating_characteristic>. > What about the true/false negative rate? Well they are just 1 - their positive counterparts, right? > To my untrained eye, these numbers look typical for an imbalanced training/test set, where we have a lot of people abandoning so it's really easy for a classifier to accurately predict that a user will leave, but probably much less accurate at predicting that a person will stay. To the contrary, the reported law FA rate means that the model is good at avoiding the Type I Error, i.e. to predict that someone would stay while actually they left. And the dataset is still very imbalanced - but there are techniques to deal with it. And I've used some of them here. > I like your "median length of inactivity" measure, that could be a good single-parameter predictor. Could be, don't know yet. > Of course, there is some risk of this being tautological: e.g. if a user is absent for a median of 5 months then they are roughly 50% likely to be absent for another 5 months (therefore considered "abandoned") in the future. Wouldn't that hold only if Lindy and Power-Law hold too? But I think they do not, see T282563#7250712 <https://phabricator.wikimedia.org/T282563#7250712>. **N.B.** I am still experimenting to see if the feature engineering process can give us even more information than we are using now. Then I will share the code and the data so that anyone can play with the model or build their own. TASK DETAIL https://phabricator.wikimedia.org/T282563 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
