GoranSMilovanovic added a comment.

  @awight
  
  First of all, I might have missed to mention that the outcome variable (i.e. 
what we are predicting) is **"stay"**, not "leave".  My bad.
  
  > I'm unsure whether "positive" here means the classifier identifies a person 
who will leave or stay, btw., can you share more about the test results?
  
  These terms have one and the same meaning in Statistical Decision Theory and 
ML, always, see ROC Analysis from Wikipedia 
<https://en.wikipedia.org/wiki/Receiver_operating_characteristic>.
  
  > What about the true/false negative rate?
  
  Well they are just 1 - their positive counterparts, right?
  
  > To my untrained eye, these numbers look typical for an imbalanced 
training/test set, where we have a lot of people abandoning so it's really easy 
for a classifier to accurately predict that a user will leave, but probably 
much less accurate at predicting that a person will stay.
  
  To the contrary, the reported law FA rate means that the model is good at 
avoiding the Type I Error, i.e. to predict that someone would stay while 
actually they left. And the dataset is still very imbalanced - but there are 
techniques to deal with it. And I've used some of them here.
  
  > I like your "median length of inactivity" measure, that could be a good 
single-parameter predictor.
  
  Could be, don't know yet.
  
  > Of course, there is some risk of this being tautological: e.g. if a user is 
absent for a median of 5 months then they are roughly 50% likely to be absent 
for another 5 months (therefore considered "abandoned") in the future.
  
  Wouldn't that hold only if Lindy and Power-Law hold too? But I think they do 
not, see  T282563#7250712 <https://phabricator.wikimedia.org/T282563#7250712>.
  
  **N.B.** I am still experimenting to see if the feature engineering process 
can give us even more information than we are using now. Then I will share the 
code and the data so that anyone can play with the model or build their own.

TASK DETAIL
  https://phabricator.wikimedia.org/T282563

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, 
Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to