[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-20 Thread Lydia_Pintscher
Lydia_Pintscher closed this task as "Resolved".
Lydia_Pintscher added a comment.


  \o/

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: GoranSMilovanovic, Aklapper, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-16 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Current state: Lydia needs to review. Amir will explain :D

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: GoranSMilovanovic, Aklapper, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-14 Thread Michael
Michael closed subtask T261850: compare model accuracy with and without 
property suggester as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Michael
Cc: GoranSMilovanovic, Aklapper, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-10 Thread Ladsgroup
Ladsgroup moved this task from Doing to Peer Review on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 3) board.
Ladsgroup added a comment.


  So I took a 750 sample of current items in wikidata. I took a stratified 
sample, like 150 from a certain range of size otherwise it would be mostly 
papers and stuff. The query: https://quarry.wmflabs.org/query/47988. This is 
one tenth of the query I did to make a new labeling campaign (that's why it's 
7500)
  
  Then I tried to get the prediction using three different models: 1- Old 
model, 2- New model with PropertySuggester (PS) feature and 3- New model 
without PS. 215 cases (30%) one of these three were not in agreement with the 
other two, out of 215, 187 of them were disagreement between the old model and 
the new ones only (and the new models were in agreement regardless of existence 
of PS feature). **You can find it in P12527#70026 
** (It's rather big) but the 
result looks good, for example 967989666 
 was being judged D 
in the old model and B in the new ones. 28 cases (3.7%) were disagreement 
between existence of PS feature. This is the list of that 28 cases:
  
  | Rev id  | 
Old model | New model with PS | New model without PS | Other nerdy stuff


   |
  | 1245749762  | C 
| B | C| {"old": {"A": 
0.033922905209765215, "B": 0.3156294884755486, "C": 0.6357983498027796, "D": 
0.011791632271694597, "E": 0.0028576242402119953}, "new_ps": {"A": 
0.07506341154975878, "B": 0.5157792612102473, "C": 0.39929552483486047, "D": 
0.00611359009974234, "E": 0.0037482123053910747}, "new_without_ps": {"A": 
0.10080501376933476, "B": 0.4091825660012928, "C": 0.4778178499618486, "D": 
0.007443348708474312, "E": 0.0047512215590494004}} |
  | 1268403491  | A 
| A | B| {"old": {"A": 
0.9581183283251842, "B": 0.028314702806298882, "C": 0.010957647415122833, "D": 
0.0013961524223473987, "E": 0.0012131690310467787}, "new_ps": {"A": 
0.47172335609710603, "B": 0.4534614216634016, "C": 0.05535875041401178, "D": 
0.011371950117362214, "E": 0.008084521708118674}, "new_without_ps": {"A": 
0.3623044878833989, "B": 0.5702558222515018, "C": 0.05067867816474142, "D": 
0.009853985767714471, "E": 0.006907025932643356}} |
  | 1272228817  | C 
| B | A| {"old": {"A": 
0.05489149933518188, "B": 0.05466214769030068, "C": 0.8814578719623226, "D": 
0.006414714358706056, "E": 0.002573766653488728}, "new_ps": {"A": 
0.29595415449014023, "B": 0.35019429930247076, "C": 0.33307570056966107, "D": 
0.012191086088356676, "E": 0.00858475954937127}, "new_without_ps": {"A": 
0.34849674768953015, "B": 0.32551627851465303, "C": 0.30496550210767087, "D": 
0.012390480540117786, "E": 0.008630991148028103}} |
  | 1032620898  | E 
| D | E| {"old": {"A": 
0.0008107712998579739, "B": 0.001213658252819447, "C": 0.011217407965526772, 
"D": 0.0507681680549962, "E": 0.9359899944267996}, "new_ps": {"A": 
0.004039595060549861, "B": 0.004426804158760506, "C": 0.021114505241631394, 
"D": 0.7165181136601142, "E": 0.2539009818789441}, "new_without_ps": {"A": 
0.0030297197261097805, "B": 0.0038494886036259317, "C": 0.012387228657753662, 
"D": 0.4631927631616361, "E": 0.5175407998508744}} |
  | 954223095    | C 
| C | B| {"old": {"A": 
0.007334705016875339, "B": 0.008463184659439892, "C": 0.8880210643634205, "D": 
0.0914071181854589, "E": 0.004773927774805269}, "new_ps": {"A": 
0.026242218233272874, "B": 0.3733216836274356, "C": 0.40173521200105566, "D": 
0.19222363724692618, "E": 0.0064772488913097965}, "new_without_ps": {"A": 
0.022162480661051785, "B": 0.39375421587841725, "C": 0.3787948617705969, "D": 
0.19888459299483738, "E": 0.006403848695096642}} |
  | 1272813710  | B 
| A | B| {"old": {"A": 
0.01246413972025038, "B": 0.9294193476502699, "C": 0.051001211254504526, "D": 
0.0039517721845516275, "E": 0.0031635291904234795}, "new_ps": {"A": 
0.5637827527490271, "B": 0.40895553231316906, "C": 0.021699633403881608, "D": 
0.0035228124820841796, "E": 0.002039269051838088}, "new_without_ps": {"A": 
0.46526672188978885, "B": 

[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-08 Thread Ladsgroup
Ladsgroup claimed this task.
Ladsgroup moved this task from To Do to Doing on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 3) board.
Restricted Application added a project: User-Ladsgroup.

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4952/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: GoranSMilovanovic, Aklapper, Lydia_Pintscher, guergana.tzatchkova, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-02 Thread Lydia_Pintscher
Lydia_Pintscher moved this task from Backlog to Item Quality Scoring 
Improvement - Sprint 3 on the Item Quality Scoring Improvement board.
Lydia_Pintscher edited projects, added Item Quality Scoring Improvement (Item 
Quality Scoring Improvement - Sprint 3); removed Item Quality Scoring 
Improvement.

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4932/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: GoranSMilovanovic, Aklapper, Lydia_Pintscher, guergana.tzatchkova, 
Akuckartz, darthmon_wmde, Michael, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-02 Thread Lydia_Pintscher
Lydia_Pintscher set the point value for this task to "3".

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Aklapper, Lydia_Pintscher, guergana.tzatchkova, Akuckartz, darthmon_wmde, 
Michael, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T261849: Benchmark old and new model accuracy on new labeled data

2020-09-02 Thread Lydia_Pintscher
Lydia_Pintscher created this task.
Lydia_Pintscher added projects: Item Quality Scoring Improvement, Wikidata.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  **Problem:**
  We would like to see if the new model is better than the old model in 
predicting the quality of Items. To do this we want to check how the old and 
new model performs with the new training data we collected.
  
  **Acceptance criteria:**
  
  [ ] we have an overview of how many Items the old model judges to be 
A/B/C/D/E class compared to the human judgement
  [ ] we have an overview of how many Items the new model judges to be 
A/B/C/D/E class compared to the human judgement

TASK DETAIL
  https://phabricator.wikimedia.org/T261849

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Aklapper, Lydia_Pintscher, guergana.tzatchkova, Akuckartz, darthmon_wmde, 
Michael, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ladsgroup, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs