Halfak added a comment.
It's not quite fair to compare the old an new feature sets. It does look like the property suggestor was having a minor positive effect, but that seems like it was not worth the additional API call. Everything that follows is just me nerding out about the stats. I think the statistics here are a bit off. Where is the STD statistic coming from? If you apply the Stderr of a proportion to the roc_auc `sqrt(p(1-p)/n)`, you get `sqrt(0.965*(1-0.965)/5000)` = 0.002599038283673 If you apply a proportion test to the difference noted in the final old/new feature comparison (which is unfair, but still interesting), you get an insignificant p-value of `0.24`. So not statistically significant. See https://www.infrrr.com/proportions/single-proportion-hypothesis-test-calculator I used 0.964 as the Null proportion and 0.975 as the sample proportion with n = 5000. TASK DETAIL https://phabricator.wikimedia.org/T261850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup, Halfak Cc: Halfak, Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
