[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-07-11 Thread Ladsgroup
Ladsgroup added a comment.
https://github.com/wiki-ai/editquality/pull/165TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Aklapper, Halfak, Ladsgroup, matej_suchanek, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-07-11 Thread Ladsgroup
Ladsgroup added a comment.
Giving the *is bot/was bot* take precedence seems the best approach to me. Will make a patch.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Aklapper, Halfak, Ladsgroup, matej_suchanek, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-07-10 Thread Halfak
Halfak added a comment.
Oohhh.  Hmm.  Yeah.  I wonder if we can adjust for block reason.  Or maybe let *is bot/was bot* take precedence.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Aklapper, Halfak, Ladsgroup, matej_suchanek, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-07-10 Thread Ladsgroup
Ladsgroup added a comment.
I loaded it in the wikilabels and started labeling but I encounter a funny problem. Most of the edits are okay and made by bots of users who go blocked (case that happens so often is MechQuesterBot) Should we do another round of autolabeling but with ignoring the block condition? That would drop 4.9K need_review out 6.6K cases which means we probably need to go back to using the 500K sample to get a 5k sample for review.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Aklapper, Halfak, Ladsgroup, matej_suchanek, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-07-09 Thread Halfak
Halfak added a comment.
Merged.  Ready for loading into Wiki labels.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Aklapper, Halfak, Ladsgroup, matej_suchanek, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, notconfusing, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-28 Thread Halfak
Halfak added a comment.
I left some notes on the PR.  I think it is more complicated than necessary.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-27 Thread Ladsgroup
Ladsgroup added a comment.
https://github.com/wiki-ai/editquality/pull/164TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-25 Thread Halfak
Halfak added a comment.
I think that should be the plan then.  Query for a random sample of 500k.  Then select *needs_review* from that set.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-25 Thread Ladsgroup
Ladsgroup added a comment.
Last time we did it with 500K. I think that's enoughTASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-21 Thread Halfak
Halfak added a comment.
How big of a sample do you think we would need in order to get enough "needs_review" samples?TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-21 Thread Halfak
Halfak added a comment.
We don't actually count all edits by people with 1000+ edits as good.  We'll check to see if the edit was reverted and if they are, they are included in the needs_review dataset.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-20 Thread Ladsgroup
Ladsgroup added a comment.
To make the dataset (sorta) balanced, we automatically mark edits made by users with more than 1K edits as trusted and doesn't need review (Look at the Makefile) and wikidata is populated by bots (more than any other wiki) so if we want to achieve a dataset to review we need to either: 1- Sample 500K and autolabel most of them using the edit count restriction  and pick up the 2k for users 2- Apply the editcount restriction in the sampling. We don't use --pop-rate in wikidata models so it doesn't matter to get the proportion right.TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T195701: new ORES labeling campaign for Wikidata

2018-06-20 Thread Halfak
Halfak added a comment.
What's the purpose of the editcount restriction?TASK DETAILhttps://phabricator.wikimedia.org/T195701EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs