awight added a comment.

  In T282563#7251679 <https://phabricator.wikimedia.org/T282563#7251679>, 
@GoranSMilovanovic wrote:
  
  > Anyways, following a series of cross-validations and tricks to account for 
a highly imbalanced dataset, one Random Forrest classifier was able to predict 
leave vs stay in Wikidata with:
  >
  > - **Accuracy of 97%**,
  > - **Hit rate (True Positive Rate, TPP) of 90%**,
  > - and a **False Alarm (False Positive Rate, FPP) of only 2.8%**.
  
  What about the true/false negative rate?  To my untrained eye, these numbers 
look typical for an imbalanced training/test set, where we have a lot of people 
abandoning so it's really easy for a classifier to accurately predict that a 
user will leave, but probably much less accurate at predicting that a person 
will stay.  I'm unsure whether "positive" here means the classifier identifies 
a person who will leave or stay, btw., can you share more about the test 
results?
  
  > The model encompasses the following features (MeanDecreaseGini is a measure 
of variable importance in Random Forests):
  
  Thanks for including the relative importance of each feature.  I like your 
"median length of inactivity" measure, that could be a good single-parameter 
predictor.  Of course, there is some risk of this being tautological: e.g. if a 
user is absent for a median of 5 months then they are roughly 50% likely to be 
absent for another 5 months (therefore considered "abandoned") in the future.  
Maybe it would help the exploration to run a tool like LIME on the model to 
learn more about how features are related to the prediction.

TASK DETAIL
  https://phabricator.wikimedia.org/T282563

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, awight
Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, 
Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to