[Wikidata-bugs] [Maniphest] T328813: Develop a ML-based service to detect vandalism on Wikidata

2024-04-29 Thread diego
diego closed subtask T341820: Evaluate and improve the Revert Risk model for Wikidata. as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T328813 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Michael, calbon, achou, MunizaA

[Wikidata-bugs] [Maniphest] T343419: Move Wikidata tools to Lift Wing

2023-08-04 Thread diego
diego added a parent task: T341820: Evaluate and improve the Revert Risk model for Wikidata.. TASK DETAIL https://phabricator.wikimedia.org/T343419 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: diego, achou, Arian_Bozorg, Ladsgroup

[Wikidata-bugs] [Maniphest] T343419: Move Wikidata tools to Lift Wing

2023-08-04 Thread diego
diego added a comment. Also the experimental model is available through the Knowledge Integrity package <https://gitlab.wikimedia.org/repos/research/knowledge_integrity>. Here you have an example Python notebook on how to use it from PAWS (or from your local machine). <https

[Wikidata-bugs] [Maniphest] T343419: Move Wikidata tools to Lift Wing

2023-08-04 Thread diego
diego added a comment. And if you want to help with the evaluation, please go to this site: https://annotool.toolforge.org/ and help us to annotate data :) TASK DETAIL https://phabricator.wikimedia.org/T343419 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T343419: Move Wikidata tools to Lift Wing

2023-08-04 Thread diego
diego added a comment. In T343419#9068806 <https://phabricator.wikimedia.org/T343419#9068806>, @achou wrote: > @elukey Research team's plan for the RevertRisk Wikidata model is to evaluate it in Q1, and then improve and deploy it in Q2. I can confirm this! TASK DETAI

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-07-07 Thread diego
diego closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T333892 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Isaac, achou, Lydia_Pintscher, MunizaA, Aklapper, leila, mrephabricator, KinneretG, Ast

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-07-07 Thread diego
diego added a comment. **Weekly Updates** - The Wikidata Revert Risk model is now available for testing on this PAWS notebook <https://public-paws.wmcloud.org/User:Diego_(WMF)/WikidataRevertRisk/wikidata_ki_example_notebook.ipynb>. I'm going to resolve this task a

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-06-30 Thread diego
diego added a subscriber: Isaac. diego added a comment. **Weekly Updates** - @MunizaA has released an alpha version of the evaluation tool. Results for Wikidata Model can be found here <https://annotool.toolforge.org/projects/6>. - For Wikidata Revert Risk, I'm going to

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-06-16 Thread diego
diego added a comment. **Weekly updates** - I'm currently working on the Model Card for this algorithm. - @MunizaA please notify us in this ticket when the annotation tool app is ready. - We are preparing the code to be shared with @Lydia_Pintscher and (through her) with volunteer

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-06-12 Thread diego
diego added a comment. - Weekly Updates** - We have met with Lydia and community developers. We are going to share our code with them and we have also learn about their efforts on automatic content patrolling in Wikidata. - The evaluation tool code is ready, this week @MunizaA would

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-06-02 Thread diego
diego added a comment. **Weekly Updates** - We are still working on the evaluation tool. TASK DETAIL https://phabricator.wikimedia.org/T333892 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: achou, Lydia_Pintscher, MunizaA

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-05-26 Thread diego
diego added a comment. **Weekly Updates** - @MunizaA is working on evaluation tool that would be usable by all the Revert Risk Models, including the Wikidata on as well as the LA and Multilingual for Wikipedia TASK DETAIL https://phabricator.wikimedia.org/T333892 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-05-14 Thread diego
diego added a comment. **Weekly Updates** - The model card for Multilingual model is available here <https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_revert_risk_model_card>. - We are working with Lydia to evaluate the model, and update if needed.

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-05-05 Thread diego
diego added a subscriber: achou. diego added a comment. **Weekly Updates** - The first version of this model is ready to go to LiftWing. - @MunizaA has submitted a merge request <https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/16>. Now

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-04-28 Thread diego
diego added a comment. **Weekly Updates** - We are finalizing the feature extraction pipeline code and the code to serve the model on LiftWing. TASK DETAIL https://phabricator.wikimedia.org/T333892 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-04-20 Thread diego
diego added a comment. **Weekly Updates** - We have develop a meta-model. This model has two main components. - The first one is a Catboost based classifier, designed to assess the Revert Risk for claims set and updates. - The second model is an hybrid approach, designed

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-04-14 Thread diego
diego added a comment. **Weekly updates** - @MunizaA has created an efficient pipeline to train HuggingFace Transformers, using the GPUs from the stat machines, and data coming from the Data Lake. - We are experimenting with different LLM such as mBert and Roberta, to detect

[Wikidata-bugs] [Maniphest] T333892: Develop a new generation of ML models for Wikidata

2023-04-07 Thread diego
diego added a subscriber: MunizaA. diego added a comment. **Weekly Updates** - @MunizaA has been testing the feasibility and utility of using Wikidata Embeddings, both for Item Quality and Revert Risk. We have studied different implementations, and experimenting with the PyTorch

[Wikidata-bugs] [Maniphest] T328813: Develop a ML-based service to detect vandalism on Wikidata

2023-03-31 Thread diego
diego added a comment. **Update** - I'm testing a Deep Learning approach, to see if offers relevant advantages over the current XGBOOST model. TASK DETAIL https://phabricator.wikimedia.org/T328813 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T332021: Wikidata Articlequality ORES/ML model needs updating after MUL

2023-03-22 Thread diego
diego added subscribers: Isaac, diego. diego added a comment. @Michael FYI: @Isaac has done interesting progress on Wikidata Item Quality automatic evaluation T321224 <https://phabricator.wikimedia.org/T321224>. Also, I'm leading another work on vandalism detection on Wikidata T

[Wikidata-bugs] [Maniphest] T328813: Develop a ML-based service to detect vandalism on Wikidata

2023-03-10 Thread diego
diego added a comment. **Update** - New features had slightly improved the accuracy (now is 75%), I'm still working on improving the model. TASK DETAIL https://phabricator.wikimedia.org/T328813 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T328813: Develop a ML-based service to detect vandalism on Wikidata

2023-03-05 Thread diego
diego added a comment. **Update** - Currently I'm working on featuring engineering. The current model has around 72% accuracy on balanced data. TASK DETAIL https://phabricator.wikimedia.org/T328813 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T328813: Develop a ML-based service to detect vandalism on Wikidata

2023-02-17 Thread diego
diego added a comment. **Update** - Still working on the data evaluation. Currently I'm studying the use of tags and user groups and their relation with reverts. TASK DETAIL https://phabricator.wikimedia.org/T328813 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread diego
diego added a comment. I'm trying to implement a link-prediction task on Wikidata, to be used as proxy for claims coverage. I'm building on top of Goyal & Ferrara <https://arxiv.org/pdf/1705.02801.pdf>'s work. The existing libraries might require some tweaks to work on the ful

[Wikidata-bugs] [Maniphest] T307323: WMDE Machine Learning (ORES)

2022-06-17 Thread diego
diego added a subscriber: Lydia_Pintscher. diego added a comment. Hey @DAbad, as part of this proposal <https://docs.google.com/document/d/1qAF7nJNAMw3yOwoKP2HuvkpuhwjaWs-BX9dSGRTkJc8/edit?usp=sharing>, I'm in conversations with @Lydia_Pintscher and @calbon to develop new

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-04-08 Thread diego
diego added a comment. **Updates** - We finished this project, results can be found on Meta <https://meta.wikimedia.org/wiki/Research:Identifying_Controversial_Content_in_Wikidata>, the code and models could be found in Gitlab <https://gitlab.wikimedia.org/repos

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-04-08 Thread diego
diego closed this task as "Resolved". diego updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Ast

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-03-25 Thread diego
diego added a comment. **Updates** - I was comparing the results when adding anonymous edits, until now I haven't find major differences with the previous results. I'll continue working on this during the next week before my next meeting with Lydia. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-03-18 Thread diego
diego added a comment. **Updates** - I've presented the main results of this work during the Tuesday Research Sessions, slides can be find here <https://docs.google.com/presentation/d/1JUqUqhlwPwCx6koy5t8oKiYC4flHNEUrpBMUkvt76xE/edit?usp=sharing>. TASK DETAIL

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-03-06 Thread diego
diego added a comment. **Updates** - We meet with Lydia and discussed the current results. - We reviewed the results confirming that most co-edited items corresponds to on going events, even when we change the time window to be considered. - Now, I'll be studying the relevance

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-02-18 Thread diego
diego added a comment. **Updates** - No updates TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-02-11 Thread diego
diego added a comment. **Updates** - I'm working in identifying collaborative edits on wikidata items not related to current events. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-02-04 Thread diego
diego added a comment. **Updates** - No updates TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-01-21 Thread diego
diego added a comment. **Updates** - We are now focusing in understanding collaborations patterns: when/how more than user edits the same item in a given period of time. - We found that in Wikidata such collaborations are less frequent than in other Wikimedia projects. - We also

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-01-16 Thread diego
diego added a comment. **Updates** - I'm organizing the new results to be discussed with the stakeholder. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-01-16 Thread diego
diego moved this task from FY2021-22-Research-Oct-Dec to FY2021-22-Research-Jan-March on the Research board. diego edited projects, added Research (FY2021-22-Research-Jan-March); removed Research (FY2021-22-Research-Oct-Dec). TASK DETAIL https://phabricator.wikimedia.org/T287946 WORKBOARD

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2022-01-07 Thread diego
diego added a comment. **Updates** - I'm focusing on modeling the relationship between topics and collaborations/controversies. - I'm working on graph representation of these components TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-12-24 Thread diego
diego added a comment. **Updates** - We have seen that few items are edited by more than one user. - We are currently researching about the item and users characteristics related to collaborative work. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-12-03 Thread diego
diego added a comment. **Updates** - No updates this week. I'm going to meet with the stakeholder next week. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-11-12 Thread diego
diego added a comment. **Updates** - I've been working on classifier to predict reverts. - The current classifier uses article (item), revision and user information. - On a balance test set, the actual model gets results over 70% of accuracy - However, there is a set

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-11-12 Thread diego
diego updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-11-05 Thread diego
diego added a comment. **Updates** - Working on modeling the reverting behavior. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-10-29 Thread diego
diego added a comment. **Updates** No updates this week. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-10-25 Thread diego
diego added a comment. **Updates** - Preliminary results presented to our stakeholder. - Next weeeks we will be focusing a deeper understanding of reverting behavior. **TODO** - Update meta page (within the next 3 weeks) TASK DETAIL https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-10-08 Thread diego
diego added a comment. **Updates** We presented this work at the TTO'21 conference <https://truthandtrustonline.com/>. We received interesting feedback, including questions about the definition of controversial content. Some potential collaboration for a second round on this re

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-10-08 Thread diego
diego moved this task from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board. diego edited projects, added Research (FY2021-22-Research-Oct-Dec); removed Research (FY2021-22-Research-July-Sept). TASK DETAIL https://phabricator.wikimedia.org/T287946 WORKBOARD

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-10-03 Thread diego
diego added a comment. **Updates** - I've started gathering and organizing the different results, to write a first report. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-09-24 Thread diego
diego added a comment. **updates** I've created a page <https://meta.wikimedia.org/wiki/Research:Identifying_Controversial_Content_in_Wikidata> on meta about this project. In the following weeks I'll be uploading some of the analysis and main results there. TASK DETAIL

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-09-17 Thread diego
diego added a comment. **Updates** - I've been crunching data to study the "disputed by" qualifier. The plan is to have some statistics on this and compare with the reverts behavior. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-09-10 Thread diego
diego added a comment. **Updates** No updates this week. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-09-03 Thread diego
diego added a comment. **Updates** - I've been running analysis on the predictability of reverts on Wikidata, including page, user and edit characteristics such as the property and the action summary explained above. - Probably not surprising I've found that the user characteristics

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-27 Thread diego
diego added a comment. **Updates** - I'm focusing on reverted revisions. - Developed a methodology to characterize Wikidata edits according different dimensions, such as the property edited, the edit type (from edit summaries), and user characteristics. (popular edit types

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-21 Thread diego
diego added a comment. No updates this week. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja, Akuckartz

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-03 Thread diego
diego added a comment. @Lydia_Pintscher , regarding your question about the number of users co-editing a Wikidata page, I found that for all edits to namespace 0, in July 2021, considering items that have at least one sitelink: - 84% of pages were edited just by one user. - 14

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-03 Thread diego
diego added a comment. As a very initial exploration, we analyzed a subset of Wikidata items, categorized them by topic, and checked which of them received more **updates**, as proxy for conroversiality. More specifically, - We selected all the Wikidata items with sitelinks

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-03 Thread diego
diego triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T287946 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: Pablo, leila, Lydia_Pintscher, diego, Aklapper, Invadibot, maantietaja, Akuckart

[Wikidata-bugs] [Maniphest] T287946: Identifying controversial content in Wikidata

2021-08-03 Thread diego
diego created this task. diego added projects: Wikidata, Research (FY2021-22-Research-July-Sept). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The aim of this project is to identify controversial content in Wikidata. Specifically we will develop the following tasks

[Wikidata-bugs] [Maniphest] T272192: Migrate to new Wikidata Analytics

2021-03-30 Thread diego
diego added a comment. I see. I was asking because we wrote these address on published papers, and those are immutable. But if is not possible, is not possible. TASK DETAIL https://phabricator.wikimedia.org/T272192 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T272192: Migrate to new Wikidata Analytics

2021-03-30 Thread diego
diego added a comment. Would be possible to add redirects from the old urls to the new ones? TASK DETAIL https://phabricator.wikimedia.org/T272192 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, diego Cc: diego, WMDE-leszek

[Wikidata-bugs] [Maniphest] T204438: finding statements that need a reference

2021-01-06 Thread diego
diego added a comment. https://dl.acm.org/doi/abs/10.1145/3366424.3383571 TASK DETAIL https://phabricator.wikimedia.org/T204438 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: diego Cc: diego, Hjfocs, Nandana, GoranSMilovanovic, Aklapper

[Wikidata-bugs] [Maniphest] T90881: Framework for checking sources on Wikidata (Does the source actually say what we claim it says?)

2020-12-23 Thread diego
diego added a comment. Hi all This problem is called Natural Language Inference (NLI) also known as textual entitlement . It is a super hot problem now in the NLP community, but imho research is still far away from producing usable tools in the Wikipedia context. This also requires

[Wikidata-bugs] [Maniphest] T155560: Linked fact checker

2020-09-24 Thread diego
diego added a comment. @leila I see some overlap although this task seems to be broader than the one I'm working on. Given that I don't see much documentation nor code about this task, I prefer to not take responsibility on this. TASK DETAIL https://phabricator.wikimedia.org/T155560

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-21 Thread diego
diego added a comment. I think we are talking about three different things: i) page_id -> CurrentWikidataItem: this was my original request, and I think @JAllemandou 's script solves this issue. Having that table updated would be great. ii) revision_id-> CurrentWikidataItem: Th

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-19 Thread diego
diego added a comment. @JAllemandou , yes. Having this by revision would be great!TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Isaac, Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread diego
diego added a comment. @Tbayer , great. Thanks.TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Tbayer, jcrespo, EBernhardson, Halfak, Nuria, JAllemandou, diego, Nandana, Akovalyov, Banyek, AndyTan

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread diego
diego added a comment. @jcrespo, the API works good for query specific pages/entities, not for example to know which pages that existing in X_wiki are missing on the Y_wiki. My point here it is that the wikidata identifier is currently the main identifier for a page/concept, and that this fact

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread diego
diego added a comment. @EBernhardson , this looks exactly what I was looking for, initially. Thank you very much for that. However, I wont close this task, because wikibase_item is still missing the page_id information. Joining by page_title does not seems very 'healthy'. We should keep

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread diego
diego added a comment. Looks good @JAllemandou, thanks. This is a good workaround, but imho, we should have an structure or schema that makes this kind of tasks easier, specially for people outside without access to a cluster.TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL

[Wikidata-bugs] [Maniphest] [Updated] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread diego
diego added a project: Wikidata. TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: diegoCc: Nuria, JAllemandou, diego, Nandana, Akovalyov, AndyTan, Lahi, Gq86, GoranSMilovanovic, QZanden, Marostegui

[Wikidata-bugs] [Maniphest] [Commented On] T182849: Identify unhelpful file names on commons

2019-02-07 Thread diego
diego added a comment. Hi @chelsyx , Check this notebook, apparently the number of white spaces are a pretty good indicator of the filename quality.TASK DETAILhttps://phabricator.wikimedia.org/T182849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx

[Wikidata-bugs] [Maniphest] [Commented On] T178249: Parameter for linking a new page to the Wikidata

2018-09-26 Thread diego
diego added a comment. Hi, Kateryna is working on this: https://meta.wikimedia.org/wiki/Research:Matching_Red_Links_with_Wikidata_Items Please ping or write something in the discussion page if you want to know more about that projecy.TASK DETAILhttps://phabricator.wikimedia.org/T178249EMAIL