daniel added a comment.

For developing the feature, we'll use a simple weighted sum based on the probability reported via the wbs_propertypairs table to get signal for completeness. Items with all high probability statements complete should be more likely to *be* complete than items that lack high probability statements.

So, let's see if I gut this right.

  • let's say Q5 has statements about P3 and P5. Based on that, we compute probabilities for P1, P2, P3, P4, P5, P6, and P7. The probability of P1, p(P1), is the scaled sum of the co-occurance probability: p(P1) = sum( co(P1, P1), co(P1, P2), co(P1, P3)... co(P1, P9) ) / 9. It would perhaps be more semantically useful to use a maximum here, but that's not what we currently do.
  • Currently, the API would output probabilities for P1, P2, P4, P6, and P7, ordered by probability.
  • If I understand correctly, what you want are just the probabilities for P3 and P5. No need to get all! So just limit the output to the probabilities. Or even the sum of these - that's what you really want, right? If the output filtering is inverted instead of omitted, you get much smaller results, and you will probably not need paging/continuations. But there is a semantic snag here. The co-occurrence probability of anything with itself is 1. So an item that has only one property, P1, will give you p(P1) = 1, and the total completeness score would also be 1, meaning 100% perfect. That's not what you meant, is it?
  • Or maybe you want the sum of the properties missing - an incompleteness score? That seems more useful, but it's not what I gather from what you wrote. But you could get that from the current API output: just sum the probabilities of the suggestions you get! If you want a completeness score, just use 1/n.

TASK DETAIL
https://phabricator.wikimedia.org/T164994

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Glorian_WD, daniel
Cc: Sjoerddebruin, daniel, aude, WMDE-leszek, hoo, Lydia_Pintscher, Halfak, Glorian_WD, Aklapper, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Sethakill, Lewizho99, Maathavan, dg711, Izno, Wikidata-bugs, jayvdb, Anomie, Mbch331, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to