MGerlach added a comment.

  In T282563#7252149 <https://phabricator.wikimedia.org/T282563#7252149>, 
@awight wrote:
  
  > In T282563#7251679 <https://phabricator.wikimedia.org/T282563#7251679>, 
@GoranSMilovanovic wrote:
  >
  >> Anyways, following a series of cross-validations and tricks to account for 
a highly imbalanced dataset, one Random Forrest classifier was able to predict 
leave vs stay in Wikidata with:
  >>
  >> - **Accuracy of 97%**,
  >> - **Hit rate (True Positive Rate, TPP) of 90%**,
  >> - and a **False Alarm (False Positive Rate, FPP) of only 2.8%**.
  >
  > What about the true/false negative rate?  To my untrained eye, these 
numbers look typical for an imbalanced training/test set, where we have a lot 
of people abandoning so it's really easy for a classifier to accurately predict 
that a user will leave, but probably much less accurate at predicting that a 
person will stay.
  
  I agree with @awight. The high accuracy is not to be taken at face value as 
the positive/negative groups are probably highly imbalanced (not sure if this 
is true but it looks like most account stop editing very quickly). Two options 
to make the numbers more interpretable:
  
  - compare with a baseline predictor that does not use any of the features. 
This could be either a random guess (for example based on the Lindy-curve) or 
simply always guessing the majority-class
  - using a balanced test-set such that you have the same number of positive 
and negative examples (for example via downsampling the majority class or vice 
versa)

TASK DETAIL
  https://phabricator.wikimedia.org/T282563

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, MGerlach
Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, 
Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to