diego closed subtask T341820: Evaluate and improve the Revert Risk model for
Wikidata. as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T328813
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Michael, calbon, achou, MunizaA
diego added a parent task: T341820: Evaluate and improve the Revert Risk model
for Wikidata..
TASK DETAIL
https://phabricator.wikimedia.org/T343419
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: diego, achou, Arian_Bozorg, Ladsgroup
diego added a comment.
Also the experimental model is available through the Knowledge Integrity
package <https://gitlab.wikimedia.org/repos/research/knowledge_integrity>.
Here you have an example Python notebook on how to use it from PAWS (or from
your local machine).
<https
diego added a comment.
And if you want to help with the evaluation, please go to this site:
https://annotool.toolforge.org/ and help us to annotate data :)
TASK DETAIL
https://phabricator.wikimedia.org/T343419
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
diego added a comment.
In T343419#9068806 <https://phabricator.wikimedia.org/T343419#9068806>,
@achou wrote:
> @elukey Research team's plan for the RevertRisk Wikidata model is to
evaluate it in Q1, and then improve and deploy it in Q2.
I can confirm this!
TASK DETAI
diego closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T333892
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Isaac, achou, Lydia_Pintscher, MunizaA, Aklapper, leila, mrephabricator,
KinneretG, Ast
diego added a comment.
**Weekly Updates**
- The Wikidata Revert Risk model is now available for testing on this PAWS
notebook
<https://public-paws.wmcloud.org/User:Diego_(WMF)/WikidataRevertRisk/wikidata_ki_example_notebook.ipynb>.
I'm going to resolve this task a
diego added a subscriber: Isaac.
diego added a comment.
**Weekly Updates**
- @MunizaA has released an alpha version of the evaluation tool. Results for
Wikidata Model can be found here <https://annotool.toolforge.org/projects/6>.
- For Wikidata Revert Risk, I'm going to
diego added a comment.
**Weekly updates**
- I'm currently working on the Model Card for this algorithm.
- @MunizaA please notify us in this ticket when the annotation tool app is
ready.
- We are preparing the code to be shared with @Lydia_Pintscher and (through
her) with volunteer
diego added a comment.
- Weekly Updates**
- We have met with Lydia and community developers. We are going to share our
code with them and we have also learn about their efforts on automatic content
patrolling in Wikidata.
- The evaluation tool code is ready, this week @MunizaA would
diego added a comment.
**Weekly Updates**
- We are still working on the evaluation tool.
TASK DETAIL
https://phabricator.wikimedia.org/T333892
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: achou, Lydia_Pintscher, MunizaA
diego added a comment.
**Weekly Updates**
- @MunizaA is working on evaluation tool that would be usable by all the
Revert Risk Models, including the Wikidata on as well as the LA and
Multilingual for Wikipedia
TASK DETAIL
https://phabricator.wikimedia.org/T333892
EMAIL PREFERENCES
diego added a comment.
**Weekly Updates**
- The model card for Multilingual model is available here
<https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_revert_risk_model_card>.
- We are working with Lydia to evaluate the model, and update if needed.
diego added a subscriber: achou.
diego added a comment.
**Weekly Updates**
- The first version of this model is ready to go to LiftWing.
- @MunizaA has submitted a merge request
<https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/16>.
Now
diego added a comment.
**Weekly Updates**
- We are finalizing the feature extraction pipeline code and the code to
serve the model on LiftWing.
TASK DETAIL
https://phabricator.wikimedia.org/T333892
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
diego added a comment.
**Weekly Updates**
- We have develop a meta-model. This model has two main components.
- The first one is a Catboost based classifier, designed to assess the
Revert Risk for claims set and updates.
- The second model is an hybrid approach, designed
diego added a comment.
**Weekly updates**
- @MunizaA has created an efficient pipeline to train HuggingFace
Transformers, using the GPUs from the stat machines, and data coming from the
Data Lake.
- We are experimenting with different LLM such as mBert and Roberta, to
detect
diego added a subscriber: MunizaA.
diego added a comment.
**Weekly Updates**
- @MunizaA has been testing the feasibility and utility of using Wikidata
Embeddings, both for Item Quality and Revert Risk. We have studied different
implementations, and experimenting with the PyTorch
diego added a comment.
**Update**
- I'm testing a Deep Learning approach, to see if offers relevant advantages
over the current XGBOOST model.
TASK DETAIL
https://phabricator.wikimedia.org/T328813
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
diego added subscribers: Isaac, diego.
diego added a comment.
@Michael FYI:
@Isaac has done interesting progress on Wikidata Item Quality automatic
evaluation T321224 <https://phabricator.wikimedia.org/T321224>. Also, I'm
leading another work on vandalism detection on Wikidata T
diego added a comment.
**Update**
- New features had slightly improved the accuracy (now is 75%), I'm still
working on improving the model.
TASK DETAIL
https://phabricator.wikimedia.org/T328813
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
diego added a comment.
**Update**
- Currently I'm working on featuring engineering. The current model has
around 72% accuracy on balanced data.
TASK DETAIL
https://phabricator.wikimedia.org/T328813
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
diego added a comment.
**Update**
- Still working on the data evaluation. Currently I'm studying the use of
tags and user groups and their relation with reverts.
TASK DETAIL
https://phabricator.wikimedia.org/T328813
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
diego added a comment.
I'm trying to implement a link-prediction task on Wikidata, to be used as
proxy for claims coverage. I'm building on top of Goyal & Ferrara
<https://arxiv.org/pdf/1705.02801.pdf>'s work. The existing libraries might
require some tweaks to work on the ful
diego added a subscriber: Lydia_Pintscher.
diego added a comment.
Hey @DAbad, as part of this proposal
<https://docs.google.com/document/d/1qAF7nJNAMw3yOwoKP2HuvkpuhwjaWs-BX9dSGRTkJc8/edit?usp=sharing>,
I'm in conversations with @Lydia_Pintscher and @calbon to develop new
diego added a comment.
**Updates**
- We finished this project, results can be found on Meta
<https://meta.wikimedia.org/wiki/Research:Identifying_Controversial_Content_in_Wikidata>,
the code and models could be found in Gitlab
<https://gitlab.wikimedia.org/repos
diego closed this task as "Resolved".
diego updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Ast
diego added a comment.
**Updates**
- I was comparing the results when adding anonymous edits, until now I
haven't find major differences with the previous results. I'll continue working
on this during the next week before my next meeting with Lydia.
TASK DETAIL
https
diego added a comment.
**Updates**
- I've presented the main results of this work during the Tuesday Research
Sessions, slides can be find here
<https://docs.google.com/presentation/d/1JUqUqhlwPwCx6koy5t8oKiYC4flHNEUrpBMUkvt76xE/edit?usp=sharing>.
TASK DETAIL
diego added a comment.
**Updates**
- We meet with Lydia and discussed the current results.
- We reviewed the results confirming that most co-edited items corresponds to
on going events, even when we change the time window to be considered.
- Now, I'll be studying the relevance
diego added a comment.
**Updates**
- No updates
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, karapayneWMDE, Invadibot
diego added a comment.
**Updates**
- I'm working in identifying collaborative edits on wikidata items not
related to current events.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
diego added a comment.
**Updates**
- No updates
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja
diego added a comment.
**Updates**
- We are now focusing in understanding collaborations patterns: when/how more
than user edits the same item in a given period of time.
- We found that in Wikidata such collaborations are less frequent than in
other Wikimedia projects.
- We also
diego added a comment.
**Updates**
- I'm organizing the new results to be discussed with the stakeholder.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila
diego moved this task from FY2021-22-Research-Oct-Dec to
FY2021-22-Research-Jan-March on the Research board.
diego edited projects, added Research (FY2021-22-Research-Jan-March); removed
Research (FY2021-22-Research-Oct-Dec).
TASK DETAIL
https://phabricator.wikimedia.org/T287946
WORKBOARD
diego added a comment.
**Updates**
- I'm focusing on modeling the relationship between topics and
collaborations/controversies.
- I'm working on graph representation of these components
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https
diego added a comment.
**Updates**
- We have seen that few items are edited by more than one user.
- We are currently researching about the item and users characteristics
related to collaborative work.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https
diego added a comment.
**Updates**
- No updates this week. I'm going to meet with the stakeholder next week.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila
diego added a comment.
**Updates**
- I've been working on classifier to predict reverts.
- The current classifier uses article (item), revision and user information.
- On a balance test set, the actual model gets results over 70% of accuracy
- However, there is a set
diego updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja,
Akuckartz, Nandana
diego added a comment.
**Updates**
- Working on modeling the reverting behavior.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper
diego added a comment.
**Updates**
No updates this week.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja
diego added a comment.
**Updates**
- Preliminary results presented to our stakeholder.
- Next weeeks we will be focusing a deeper understanding of reverting
behavior.
**TODO**
- Update meta page (within the next 3 weeks)
TASK DETAIL
https://phabricator.wikimedia.org
diego added a comment.
**Updates**
We presented this work at the TTO'21 conference
<https://truthandtrustonline.com/>. We received interesting feedback, including
questions about the definition of controversial content. Some potential
collaboration for a second round on this re
diego moved this task from FY2021-22-Research-July-Sept to
FY2021-22-Research-Oct-Dec on the Research board.
diego edited projects, added Research (FY2021-22-Research-Oct-Dec); removed
Research (FY2021-22-Research-July-Sept).
TASK DETAIL
https://phabricator.wikimedia.org/T287946
WORKBOARD
diego added a comment.
**Updates**
- I've started gathering and organizing the different results, to write a
first report.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo
diego added a comment.
**updates**
I've created a page
<https://meta.wikimedia.org/wiki/Research:Identifying_Controversial_Content_in_Wikidata>
on meta about this project. In the following weeks I'll be uploading some of
the analysis and main results there.
TASK DETAIL
diego added a comment.
**Updates**
- I've been crunching data to study the "disputed by" qualifier. The plan is
to have some statistics on this and compare with the reverts behavior.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENC
diego added a comment.
**Updates**
No updates this week.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja
diego added a comment.
**Updates**
- I've been running analysis on the predictability of reverts on Wikidata,
including page, user and edit characteristics such as the property and the
action summary explained above.
- Probably not surprising I've found that the user characteristics
diego added a comment.
**Updates**
- I'm focusing on reverted revisions.
- Developed a methodology to characterize Wikidata edits according different
dimensions, such as the property edited, the edit type (from edit summaries),
and user characteristics.
(popular edit types
diego added a comment.
No updates this week.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja,
Akuckartz
diego added a comment.
@Lydia_Pintscher , regarding your question about the number of users
co-editing a Wikidata page, I found that for all edits to namespace 0, in July
2021, considering items that have at least one sitelink:
- 84% of pages were edited just by one user.
- 14
diego added a comment.
As a very initial exploration, we analyzed a subset of Wikidata items,
categorized them by topic, and checked which of them received more **updates**,
as proxy for conroversiality.
More specifically,
- We selected all the Wikidata items with sitelinks
diego triaged this task as "High" priority.
TASK DETAIL
https://phabricator.wikimedia.org/T287946
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja,
Akuckart
diego created this task.
diego added projects: Wikidata, Research (FY2021-22-Research-July-Sept).
Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION
The aim of this project is to identify controversial content in Wikidata.
Specifically we will develop the following tasks
diego added a comment.
I see. I was asking because we wrote these address on published papers, and
those are immutable. But if is not possible, is not possible.
TASK DETAIL
https://phabricator.wikimedia.org/T272192
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
diego added a comment.
Would be possible to add redirects from the old urls to the new ones?
TASK DETAIL
https://phabricator.wikimedia.org/T272192
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic, diego
Cc: diego, WMDE-leszek
diego added a comment.
https://dl.acm.org/doi/abs/10.1145/3366424.3383571
TASK DETAIL
https://phabricator.wikimedia.org/T204438
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: diego
Cc: diego, Hjfocs, Nandana, GoranSMilovanovic, Aklapper
diego added a comment.
Hi all
This problem is called Natural Language Inference (NLI) also known as textual
entitlement . It is a super hot problem now in the NLP community, but imho
research is still far away from producing usable tools in the Wikipedia
context. This also requires
diego added a comment.
@leila I see some overlap although this task seems to be broader than the one
I'm working on. Given that I don't see much documentation nor code about this
task, I prefer to not take responsibility on this.
TASK DETAIL
https://phabricator.wikimedia.org/T155560
diego added a comment.
I think we are talking about three different things:
i) page_id -> CurrentWikidataItem: this was my original request, and I think
@JAllemandou 's script solves this issue. Having that table updated would be
great.
ii) revision_id-> CurrentWikidataItem: Th
diego added a comment.
@JAllemandou , yes. Having this by revision would be great!TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou
diego added a comment.
@Tbayer , great. Thanks.TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, Nandana, Akovalyov, Banyek, AndyTan
diego added a comment.
@jcrespo, the API works good for query specific pages/entities, not for example to know which pages that existing in X_wiki are missing on the Y_wiki.
My point here it is that the wikidata identifier is currently the main identifier for a page/concept, and that this fact
diego added a comment.
@EBernhardson , this looks exactly what I was looking for, initially. Thank you very much for that.
However, I wont close this task, because wikibase_item is still missing the page_id information. Joining by page_title does not seems very 'healthy'. We should keep
diego added a comment.
Looks good @JAllemandou, thanks.
This is a good workaround, but imho, we should have an structure or schema that makes this kind of tasks easier, specially for people outside without access to a cluster.TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL
diego added a project: Wikidata.
TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Nuria, JAllemandou, diego, Nandana, Akovalyov, AndyTan, Lahi, Gq86, GoranSMilovanovic, QZanden, Marostegui
diego added a comment.
Hi @chelsyx ,
Check this notebook, apparently the number of white spaces are a pretty good indicator of the filename quality.TASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx
diego added a comment.
Hi,
Kateryna is working on this: https://meta.wikimedia.org/wiki/Research:Matching_Red_Links_with_Wikidata_Items
Please ping or write something in the discussion page if you want to know more about that projecy.TASK DETAILhttps://phabricator.wikimedia.org/T178249EMAIL
71 matches
Mail list logo