[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-14 Thread Michael
Michael moved this task from Peer Review to Done on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 3) board.
Michael closed this task as "Resolved".
Michael added a comment.


  This has been done in #159 
.
  
  I agree that it is strange that property suggester does not have a bigger 
impact. Are its suggestions maybe not actually as useful as we had hoped? Would 
it maybe make sense to use the suggestions from the Recoin gadget?

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4952/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Michael
Cc: Halfak, Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, 
guergana.tzatchkova, Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, 
Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-13 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  In T261850#6449787 , 
@Michael wrote:
  
  > Great! I'll make a pull request for removing it. 
  >
  > Removing property suggester has also the positive side-effect that our 
scores for dumps and the API should be the same again. cc @Lydia_Pintscher
  
  Let's do it! :)
  Random note: very curious to me that this doesn't have a bigger effect.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: Halfak, Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, 
guergana.tzatchkova, Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, 
Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-10 Thread Halfak
Halfak added a comment.


  It's not quite fair to compare the old an new feature sets.  It does look 
like the property suggestor was having a minor positive effect, but that seems 
like it was not worth the additional API call.  Everything that follows is just 
me nerding out about the stats.
  
  I think the statistics here are a bit off.  Where is the STD statistic coming 
from?
  
  If you apply the Stderr of a proportion to the roc_auc `sqrt(p(1-p)/n)`, you 
get `sqrt(0.965*(1-0.965)/5000)` =  0.002599038283673  If you apply a 
proportion test to the difference noted in the final old/new feature comparison 
(which is unfair, but still interesting), you get an insignificant p-value of 
`0.24`.  So not statistically significant.  See 
https://www.infrrr.com/proportions/single-proportion-hypothesis-test-calculator 
 I used 0.964 as the Null proportion and 0.975 as the sample proportion with n 
= 5000.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Halfak
Cc: Halfak, Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, 
guergana.tzatchkova, Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, 
Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, 
aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-10 Thread Michael
Michael added a comment.


  Great! I'll make a pull request for removing it. 
  
  Removing property suggester has also the positive side-effect that our scores 
for dumps and the API should be the same again. cc @Lydia_Pintscher

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Michael
Cc: Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-10 Thread Ladsgroup
Ladsgroup added a comment.


  In T261850#6449735 , 
@Michael wrote:
  
  > Thank you for your thorough research. That means we can effectively drop 
property suggester? Not having to do that extra network request should speed 
some things up.
  
  Yes, I'd vote for removing it, it's not just an extra API call, it's also a 
rather slow one (removing it always speed up my model training drastically)

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-10 Thread Michael
Michael added a comment.


  Thank you for your thorough research. That means we can effectively drop 
property suggester? Not having to do that extra network request should speed 
some things up.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Michael
Cc: Michael, Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-09 Thread Ladsgroup
Ladsgroup added a comment.


  So I did a little bit of statistics. First I rebuilt the old model with the 
old features multiple times to build a distribution of roc_auc and other 
metrics it produced.
  
  - For roc_auc the mean is 0.965, the std is 0.000655
  - For accuracy, the mean is 0.921 and the std is 0.000663
  
  The z value for changes caused by the new feature for roc auc is 9.98 and for 
accuracy 11.1, these are so big that no z tables have the p values for them 
(and online tools give plain zero for that z score). Meaning statistically it's 
impossible to new features to improve accuracy just by chance.
  
  For PS OTOH: The z score of roc auc is 2.11 and for accuracy is 0.69 which 
according to z tables means the p-values are 17% and 24% respectively meaning 
it's very likely that PS has no effect on the model performance at all and all 
changes are by chance (the p-value is usually considered good enough if it's 
lower than 5% or 1%). This makes a lots of sense given that in some places 
adding PS seems to decrease the performance instead.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-07 Thread Ladsgroup
Ladsgroup moved this task from Doing to Peer Review on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 3) board.
Ladsgroup added a comment.


  (All are micro average 

 not macro)
  
  Old-data, new features
  
  | Metric  | Without Property suggester | With Property suggester 
| Difference|
  | False positive rate | 0.057  | 0.058   
| 1.7% decrease |
  | Accuracy| 0.929  | 0.929   
| No difference |
  | roc_auc | 0.972  | 0.973   
| 0.1% increase |
  |
  
  New data, new features
  
  | Metric  | Without Property suggester | With Property suggester 
| Difference|
  | False positive rate | 0.115  | 0.116   
| 0.87% increase(!) |
  | Accuracy| 0.818  | 0.817   
| 0.1% decrease(!)  |
  | roc_auc | 0.864  | 0.866   
| 0.2% increase |
  |
  
  All data combined, new features
  
  | Metric  | Without Property suggester | With Property suggester 
| Difference|
  | False positive rate | 0.052  | 0.052   
| No difference |
  | Accuracy| 0.93   | 0.931   
| 0.1% increase |
  | roc_auc | 0.964  | 0.965   
| 0.1% increase |
  |
  
  As you can see there's not much that can be gained from item completeness 
metric, my hypotheses is that it used to be useful when all of our features 
were broken. Let's try with one thing only:
  
  Old-data, old features:
  
  | Metric  | Without Property suggester | With Property suggester 
| Difference|
  | False positive rate | 0.066  | 0.066   
| No difference |
  | Accuracy| 0.922  | 0.923   
| 0.1% increase |
  | roc_auc | 0.964  | 0.965   
| 0.1% increase |
  |
  
  Nope, no actual change :/ also in the commit that introduced it I can find 
any improvement either: 
https://github.com/wikimedia/articlequality/commit/1d0feffdcecbbee6fa11903531edc5e4e91b41e3#diff-834e2f59d8597053582b57dc05d4c08e
  
  While we are here, it's nice to compare old features and new features:
  old-data, without property suggester
  
  | Metric  | Old features | new features | Difference|
  | False positive rate | 0.066| 0.057| 14% decrease  |
  | Accuracy| 0.922| 0.929| 0.8% increase |
  | roc_auc | 0.964| 0.972| 0.8% increase |
  |
  
  It shows a sharp increase in accuracy, 1% might not be much but keep it in 
mind we are in the long tail. If I want to explain it better, we should flip 
the values for accuracy and ROC-AUC. Then you will have 9% decrease in 
inaccuracy (That value for adding property suggester with old data and old 
features is 1.3% decrease inaccuracy which is still negligible IMO)

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4952/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-07 Thread Ladsgroup
Ladsgroup claimed this task.
Ladsgroup moved this task from To Do to Doing on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 3) board.
Restricted Application added a project: User-Ladsgroup.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4952/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-02 Thread Lydia_Pintscher
Lydia_Pintscher moved this task from Backlog to Item Quality Scoring 
Improvement - Sprint 3 on the Item Quality Scoring Improvement board.
Lydia_Pintscher edited projects, added Item Quality Scoring Improvement (Item 
Quality Scoring Improvement - Sprint 3); removed Item Quality Scoring 
Improvement.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4932/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261850: compare model accuracy with and without property suggester

2020-09-02 Thread Lydia_Pintscher
Lydia_Pintscher created this task.
Lydia_Pintscher added projects: Item Quality Scoring Improvement, Wikidata.

TASK DESCRIPTION
  **Problem:**
  The property suggester is only taken into account when scoring an Item live. 
It is not taken into account when scoring an Item in the dump. We want to 
understand better how the property suggester influences the quality scoring. In 
order to do that we compare the accuracy with and without the property 
suggester signal available.
  
  **Acceptance criteria:**
  
  [ ] We can see how much of an influence the property suggester signal has on 
the accuracy of the quality scoring.

TASK DETAIL
  https://phabricator.wikimedia.org/T261850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Aklapper, GoranSMilovanovic, Lydia_Pintscher, guergana.tzatchkova, 
Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs