Halfak added a comment.

  It's not quite fair to compare the old an new feature sets.  It does look 
like the property suggestor was having a minor positive effect, but that seems 
like it was not worth the additional API call.  Everything that follows is just 
me nerding out about the stats.
  
  I think the statistics here are a bit off.  Where is the STD statistic coming 
from?
  
  If you apply the Stderr of a proportion to the roc_auc `sqrt(p(1-p)/n)`, you 
get `sqrt(0.965*(1-0.965)/5000)` =  0.002599038283673  If you apply a 
proportion test to the difference noted in the final old/new feature comparison 
(which is unfair, but still interesting), you get an insignificant p-value of 
`0.24`.  So not statistically significant.  See 
https://www.infrrr.com/proportions/single-proportion-hypothesis-test-calculator 
 I used 0.964 as the Null proportion and 0.975 as the sample proportion with n 
= 5000.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Halfak
Cc: Halfak, Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, 
guergana.tzatchkova, Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, 
Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, Ladsgroup, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to