[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-08-03 Thread Isaac
Isaac added a comment. I'm going to be out the next several weeks so FYI likely won't hear updates until mid-September on this. Thanks for these additional details though! > Now there are several Properties that can represent such relations. The main ones we should probably focus on are

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-28 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Thanks for this! So in general it is pretty important for Items to be classified and put into the right place in the larger ontology. So these statements do imho deserve some sort of special status as they are generally more important than other

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-25 Thread Isaac
Isaac added a comment. > That's quite an interesting table! Would it be possible to get the actual Item IDs for the last two rows? It could be instructive to know which Items the model thinks are very incomplete but have excellent quality :) @Michael thanks for the questions! Some

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-24 Thread Michael
Michael added a comment. In T321224#9035684 , @Isaac wrote: > Oooh and the job worked! High-level data on overlap between the two scores where they are the same except completeness just takes into account how many of the expected

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment. Oooh and the job worked! High-level data on overlap between the two scores where they are the same except completeness just takes into account how many of the expected claims/refs/labels are there and quality adds the total number of claims to the features too:

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment. Updates: - Finally ported all the code from the API to work on the cluster. I don't know if it'll run to completeness yet but I ran it on a subset and the results largely matched the API:

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-30 Thread Isaac
Isaac added a comment. Updates: - Wrestling with re-adapting everything to the cluster but making good progress. One of the main challenges is that the wikidata item schema is different between cluster and API so lots of little errors that I'm having to discover and correct as I make

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-23 Thread Isaac
Isaac added a comment. Updates: - Successfully generated the property data I need so now I have the necessary data to run the model in bulk on the cluster and can turn towards generating a dataset for sampling. Notebook:

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-16 Thread Isaac
Isaac added a comment. Updates: - Began process of regenerating property-frequency table on cluster given that we shouldn't depend on RECOIN for bulk computation even if it greatly simplifies the API prototype. Working out a few bugs but feel like I have the right approach and

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-05-12 Thread Isaac
Isaac added a comment. No updates still with prep for wikiworkshop/hackathon but after next week, hoping to get back to this! TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc:

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-11 Thread Isaac
Isaac added a comment. From discussion with Lydia/Diego: - The concept of `completeness` feels closer to what we want than `quality` -- i.e. allowing for more nuance in how many statements are associated with a given item. We came up with a few ideas for how to make assessing item

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila moved this task from FY2022-23-Research-January-March to In Progress on the Research board. leila edited projects, added Research; removed Research (FY2022-23-Research-January-March). TASK DETAIL https://phabricator.wikimedia.org/T321224 WORKBOARD

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila added a parent task: T333892: Develop a new generation of ML models for Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, leila Cc: Michael, Lydia_Pintscher, diego, Miriam,

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila removed a parent task: T293478: Content Tagging Models. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, leila Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1,

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-24 Thread Isaac
Isaac added a comment. Updated API to be slightly more robust to instance-of-only edge cases and provide the individual features. Output for https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155: { "item": "https://www.wikidata.org/wiki/Q67559155;, "features":

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-17 Thread Isaac
Isaac added a comment. I still need to do some checks because I know e.g., this fails when the item lacks statements, but I put together an API for testing the model. It has two outputs: a quality class (E worst to A best) that uses the number of claims on the item as a feature (along with

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-10 Thread Isaac
Isaac added a comment. Weekly updates: - Discussed with Diego the challenge of whether our annotated data is really assessing what we want it to. I'll try to join the next meeting with Lydia to hear more and figure out our options. - Diego is also considering how embeddings might help

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-03 Thread Isaac
Isaac added a comment. I slightly tweaked the model but also experimented with adding just a simple square-root of the number of existing claims to the model and found that that is essentially that's all that is needed to almost match ORES quality (which is near perfect) for predicting item

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-16 Thread Isaac
Isaac added a comment. Weekly update: - I cleaned up the results notebook . The original ORES model does better on the labeled data than my initial model. This isn't a big

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-10 Thread Isaac
Isaac added a comment. > Recoin I believe didn't exist at that point. It was also not integrated in the existing production systems. I don't think we ever did a proper analysis of what it's currently capable of and how good it is for judging Item quality. Thanks -- useful context. I'll

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-03 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T321224#8521681 , @Isaac wrote: > @Lydia_Pintscher I was reminded recently of Recoin (and the closely related PropertySuggester

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-27 Thread Isaac
Isaac added a comment. I started a PAWS notebook where I will evaluate the proposed strategy (Recoin with additional of reference/labels rules) against the 2020 dataset (~4k items) of assessed Wikidata item qualities. This will allow me to relatively cheapily assess the method before trying

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-24 Thread Isaac
Isaac moved this task from FY2022-23-Research-October-December to FY2022-23-Research-January-March on the Research board. Isaac edited projects, added Research (FY2022-23-Research-January-March); removed Research (FY2022-23-Research-October-December). TASK DETAIL

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-12 Thread Isaac
Isaac added a subscriber: Lydia_Pintscher. Isaac added a comment. @Lydia_Pintscher I was reminded recently of Recoin (and the closely related PropertySuggester ) and that got me

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread diego
diego added a comment. I'm trying to implement a link-prediction task on Wikidata, to be used as proxy for claims coverage. I'm building on top of Goyal & Ferrara 's work. The existing libraries might require some tweaks to work on the full Wikidata

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread Isaac
Isaac added a comment. Weekly updates: - I focused on the references component of the model this week. I built heavily on Amaral, Gabriel, Alessandro Piscopo, Lucie-Aimée Kaffee, Odinaldo Rodrigues, and Elena Simperl. "Assessing the quality of sources in Wikidata across languages: a

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-16 Thread Isaac
Isaac added a comment. Able to start thinking about this again and a few thoughts: - Machine-in-the-loop: when we built quality models for the Wikipedia language communities, it was with the idea that the models could potentially support the existing editor processes for assigning

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-02 Thread Isaac
Isaac added a comment. Update: past few weeks have been busy so I haven't had a chance to look into this but I'm hoping to get more time in December to focus on it. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-11-14 Thread AOdit_WMF
AOdit_WMF added a project: Linked-Open-Data-Network-Program. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, AOdit_WMF Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot,

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-11-04 Thread Isaac
Isaac added a comment. Weekly update: - Summarizing some past research shared / further examinations of the existing ORES model shared by LP: - We have to be careful to adjust expectations for a given claim depending on its property type (distribution of property types on Wikidata

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-10-30 Thread Lydia_Pintscher
Lydia_Pintscher added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, Lydia_Pintscher Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns,