awight added a comment.
In T282563#7251679 <https://phabricator.wikimedia.org/T282563#7251679>, @GoranSMilovanovic wrote: > Anyways, following a series of cross-validations and tricks to account for a highly imbalanced dataset, one Random Forrest classifier was able to predict leave vs stay in Wikidata with: > > - **Accuracy of 97%**, > - **Hit rate (True Positive Rate, TPP) of 90%**, > - and a **False Alarm (False Positive Rate, FPP) of only 2.8%**. What about the true/false negative rate? To my untrained eye, these numbers look typical for an imbalanced training/test set, where we have a lot of people abandoning so it's really easy for a classifier to accurately predict that a user will leave, but probably much less accurate at predicting that a person will stay. I'm unsure whether "positive" here means the classifier identifies a person who will leave or stay, btw., can you share more about the test results? > The model encompasses the following features (MeanDecreaseGini is a measure of variable importance in Random Forests): Thanks for including the relative importance of each feature. I like your "median length of inactivity" measure, that could be a good single-parameter predictor. Of course, there is some risk of this being tautological: e.g. if a user is absent for a median of 5 months then they are roughly 50% likely to be absent for another 5 months (therefore considered "abandoned") in the future. Maybe it would help the exploration to run a tool like LIME on the model to learn more about how features are related to the prediction. TASK DETAIL https://phabricator.wikimedia.org/T282563 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, awight Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
